Tutorial 02 – Industry Practices and Tools


1. What is the need for VCS?

 Version Control Benefits. Version control systems allow you to compare files, identify differences, and merge the changes if needed prior to committing any code. Versioning is also a great way to keep track of application builds by being able to identify which version is currently in development, QA, and production.

2. Differentiate the three models of VCSs, stating their pros and cons?
Version Control is essential to development, even if you're working by yourself because it protects you from yourself. If you make a mistake, it's a simple matter to rollback to a previous version of your code that you know works. This also frees you to explore and experiment with your code because you're free of having to worry about whether what you're doing is reversible or not. There are two major branches of Version Control Systems (VCS), Centralized and Distributed.
Centralized VCS are based on using a central server, where everyone "checks out" a project, works on it, and "commits" their changes back to the server for anybody else to use. The major Centralized VCS are CVS and SVN. Both have been heavily criticized because "merging" "branches" is extremely painful with them.  
Distributed VCS let everyone have their own server, where you can "pull" changes from other people and "push" changes to a server. The most common Distributed VCS are Git and Mercurial. [TODO: write more on Distributed VCS]
If you're working on a project I heavily recommend using a distributed VCS. I recommend Git because it's blazingly fast, but is has been criticized as being too hard to use. If you don't mind using a commercial product BitKeeper is supposedly easy to use.

3. Git and GitHub, are they same or different? Discuss with facts?

Git is a distributed version control system that allows developers to track changes in files and work with other developers. It was developed in 2005 by Linus Torvalds, the creator of Linux.
Git stands apart from other SCRs because of the approach to working with data. Most other systems store information in the form of a list of changes in the files. Instead, Git’s approach to storing data is more like a set of snapshots of a miniature file system. Every time you save the state of your project in Git, the system remembers how each file looks at this moment and saves a link to this snapshot. So in case if you messed up with your code and ctrl+Z does not work, Git allows you to revert files to a previous state you can even revert the entire project back, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when. Git it actually does not have to have any server. Through git technology, you can make your own local machine as a source code repository not required to have a centralized server
GitHub is an online hosting service for repositories. It has all the functions of distributed version control and source control functionality — everything that supports Git and even more. GitHub allows you to share repositories, access other’s repositories, store remote copies of your repositories (GitHub servers) as a backup of your local copies. Developers use GitHub in conjunction with Git as it allows to save their code online.

GitHub lets developers interact with each other in different projects. GitHub also boasts access control, bug tracking, task management and a wiki for each project. The goal of GitHub is to promote developer interaction., it was called a sort of Facebook for developers, in Facebook people share posts and pictures, in Github developers share code
  • Git is a revision control system, a tool to manage your source code history.
  • GitHub is a hosting service for Git repositories.
  • So they are not the same thing: Git is the toolGitHub is the service for projects that use Git.

4. Compare and contrast the Git commands, commit and push?

Basically git commit "records changes to the repository" while git push "updates remote refs along with associated objects". So the first one is used in connection with your local repository, while the latter one is used to interact with a remote repository.


5. Discuss the use of staging area and Git directory?

Staging is a step before the commit process in git. That is, a commit in git is performed in two steps: staging and actual commit. As long as a changeset is in the staging area, git allows you to edit it as you like (replace staged files with other versions of staged files, remove changes from staging, etc.)
Staging area
Staging area is a temp schema used to
Do flat mapping i.e. dumping all the OLTP data in to it without applying any business rules. Pushing data into staging will take less time because there are no business rules or transformation applied on it.
Used for data cleansing and validation using First Logic
A staging area is like a large table with data separated from their sources to be loaded into a data warehouse in the required format. If we attempt to load data directly from OLTP, it might mess up the OLTP because of format changes between a warehouse and OLTP. Keeping the OLTP data intact is very important for both the OLTP and the warehouse.
According to the complexity of the business rule, we may require staging area, the basic need of staging area is to clean the OLTP source data and gather in a place. Basically it’s a temporary database area. Staging area data is used for the further process and after that they can be deleted


6. Explain the collaboration workflow of Git, with example?

A Git Workflow is a recipe or recommendation for how to use Git to accomplish work in a consistent and productive manner. Git workflows encourage users to leverage Git effectively and consistently. Git offers a lot of flexibility in how users manage changes. Given Git's focus on flexibility, there is no standardized process on how to interact with Git. When working with a team on a Git managed project, it’s important to make sure the team is all in agreement on how the flow of changes will be applied. To ensure the team is on the same page, an agreed upon Git workflow should be developed or selected. There are several publicized Git workflows that may be a good fit for your team. Here, we’ll be discussing some of these workflow options.


7. Discuss the benefits of CDNs?
 
 •Improving website load times-By distributing content closer to website visitors by using a nearby CDN server (among other optimizations), visitors experience faster page loading times. As visitors are more inclined to click away from a slow-loading site, a CDN can reduce bounce rates and increase the amount of time that people spend on the site. In other words, a faster a website means more visitors will stay and stick around longer.
•Reducing bandwidth costs-Bandwidth consumption costs for website hosting is a primary expense for websites. Through caching and other optimizations, CDNs are able to reduce the amount of data an origin server must provide, thus reducing hosting costs for website owners. •Increasing content availability and redundancy-Large amounts of traffic or hardware failures can interrupt normal website function. Thanks to their distributed nature, a CDN can handle more traffic and withstand hardware failure better than many origin servers.
 •Improving website security-A CDN may improve security by providing DDoSmitigation, improvements to security certificates, and other optimizations.


8. How CDNs differ from web hosting servers?
  • Web Hosting is used to host your website on a server and let users access it over the internet. A content delivery network is about speeding up the access/delivery of your website’s assets to those users.
  • Traditional web hosting would deliver 100% of your content to the user. If they are located across the world, the user still must wait for the data to be retrieved from where your web server is located. A CDN takes a majority of your static and dynamic content and serves it from across the globe, decreasing download times. Most times, the closer the CDN server is to the web visitor, the faster assets will load for them.
  • Web Hosting normally refers to one server. A content delivery network refers to a global network of edge servers which distributes your content from a multi-host environment.
9. Identify free and commercial CDNs?
  • CloudFlare. CloudFlare is popularly known as the best free CDN for WordPress users. ...
  • Incapsula. Incapsula provides Application Delivery from the cloud: Global CDN, Website Security, DDoS Protection, Load Balancing & Failover. ...
  • Photon by Jetpack. ...
  • Swarmify.
10. Discuss the requirements for virtualization?

CPU

The three elements to consider when selecting virtualization hardware include the CPU, memory, and network I/O capacity. They're all critical for workload consolidation.
Issues with the CPU pertain to either clock speed or the number of cores held by the CPU. Please don't run out and buy the market's fastest CPU. Instead, buy one with more modest clock speed and a greater number of cores.
You'll receive better consolidation from two CPUs with 2.4 GHz and 10 cores than you will from two CPUs with 3 GHz and 4 cores. Invest in faster CPUs only when your workload demands it. The best server for virtualization will include CPUs with large internal caches.

Memory

Your virtual machine resides in memory. The more memory you have, the greater your consolidation. You need at least enough DDR3 memory to support the number of workloads you run on the system.
Take the 10-core example above. The two 10 core CPUs would support 40 threads of potential workloads. We derive this number from adding the number of cores (20 total). Then we multiply the result by 2 because each core has two threads.
If each workload uses 2 GB, your server would need at least 80 GB. The closest binary equivalent would be either 96 GB. Anything less would compromise your consolidation or your performance.
Anything more would just be a waste of money.
It's worth noting that memory resilience features require extra memory modules. They won't add to your available memory pool. Save these features for your servers that run mission-critical workloads.

Network Access

Be sure you have adequate bandwidth available.
Consider upgrading your network interface to a quad port NIC. You may even install a 10 GbE NIC if your workload demands justify it.
Common 1 GbE network interface cards just won't cut it. Get rid of them and set up more rigorous network access.



11. Discuss and compare the pros and cons of different virtualization techniques in different levels?

Virtualization has several benefits. For businesses with limited funds, virtualization helps them stay on budget by eliminating the need to invest in tons of hardware. Creating virtual environments to work in also helps businesses with limited IT staff automate routine tasks and centralize resource management. Further, employees can access their data anytime, anywhere, using any device. However, virtualized environments have drawbacks. Here are the major pros and cons of virtualization.
Pro: Reduced IT costs
Virtualization helps businesses reduce costs in several ways, according to Mike Adams, senior director of cloud platform product marketing at VMware.
  • Capital expenditure savings. Virtualization lets companies reduce their IT costs by requiring fewer hardware servers and related resources to achieve the same level of computing performance, availability and scalability.
  • Operational expenditure savings. Once servers are virtualized, your IT staff can greatly reduce the ongoing administration and management of manual, time-consuming processes by automating operations, thus resulting in lower operational expenses.
  • Data center and energy-efficiency savings. As companies reduce the size of their hardware and server footprint, they lower their energy consumption, cooling power and data center square footage, thus resulting in lower costs. [Read related story: Best Virtualization Solutions for Small Businesses]
Con: The upfront costs are hefty
If you're transitioning a legacy system to a virtualized one, upfront costs are likely to be expensive. Be prepared to spend upwards of $10,000 for the servers and software licenses. However, as virtualization technology improves and becomes more commonplace, costs will go down D
Pro: Efficient resource utilization
Virtualization enables businesses to get the most out of their investment in hardware and resources. "As customer data center environments grow in size and complexity, managing it becomes a burden," Adams said. "Virtualization can greatly help reduce this complexity by offering resource management capabilities to help increase efficiencies in these virtual environments."
In contrast, traditional infrastructures that use multiple servers don't make the most out of their setups. "Many of those servers would typically not utilize more than 2 to 10 percent of the server hardware resources," said John Livesay, vice president of Infranet Technologies, a network infrastructure services provider. "With virtualization, we can now run multiple virtual servers on a single virtual host [and make] better use of the resources available."
Con: Not all hardware or software can be virtualized
The drawback, however, is that not all servers and applications are virtualization-friendly, Livesay said. "Typically, the main reason you may not virtualize a server or application is only because the application vendor may not support it yet, or recommend it," he said.
But virtualization is highly scalable. It lets businesses easily create additional resources as required by many applications, such as by easily adding extra servers – it's all done on-demand on an as-needed basis, without any significant investments in time or money.
IT admins can create new servers quickly, because they do not need to purchase new hardware each time they need a new server, Livesay said. "If the resources are available, we can create a new server in a few clicks of a mouse," he added.
The ease of creating additional resources also helps businesses scale as they grow. "This scenario might be good for small businesses that are growing quickly, or businesses using their data center for testing and development," Livesay said.
Businesses should keep in mind, though, that one of the main goals and advantages of virtualization is the efficient use of resources. Therefore, they should be careful not to let the effortlessness of creating servers result in the carelessness of allocating resources. 
"Server sprawl is one of the unintended consequences of virtualization," Livesay said. "Once administrators realize how easy it is to add new servers, they start adding a new server for everything. Soon you find that instead of six to 10 servers, you are now managing 20 to 30 servers." 
The limitations virtualization faces include a lack of awareness that certain applications or workloads can be virtualized, according to Adams.
"Workloads such as Hadoop, NoSQL databases, Spark and containers often start off on bare-metal hardware but present new opportunities to be virtualized later on," Adams said. "Virtualization can now support many new applications and workloads within the first 60 to 90 days on the market."
Although more software applications are adapting to virtualization situations, there can be licensing complications due to multiple hosts and migrations. Regarding performance and licensing issues, it's prudent to check if certain essential applications work well in a virtualized environment.


12. Identify popular implementations and available tools for each level of visualization?

Visualizing the tree line using solar panels

Tool-R

Calculating the Age of the Universe

Tool Used: R

Rendering the Moon using Earth’s Colors

Tool Used: Python

The World Seen Through 17,000 Travel Itineraries

Tools Used: Tableau, Gephy




13. What is the hypervisor and what is the role of it?

.    Hypervisor, also known as a virtual machine monitor, is a process that creates and runs virtual machines (VMs). A hypervisor allows one host computer to support multiple guest VMs by virtually sharing its resources, like memory and processing. Generally, there are two types of hypervisors. Type 1 hypervisors, called “bare metal,” run directly on the host’s hardware. Type 2 hypervisors, called “hosted,” run as a software layer on an operating system, like other computer programs.

One of the key functions a hypervisor provides is isolation, meaning that a guest cannot affect the operation of the host or any other guest, even if it crashes. As such, the hypervisor must carefully emulate the hardware of a physical machine, and (except under carefully controlled circumstances), prevent access by a guest to the real hardware. How the hypervisor does this is a key determinant of virtual machine performance. But because emulating real hardware can be slow, hypervisors often provide special drivers, so called ‘paravirtualized drivers’ or ‘PV drivers’, such that virtual disks and network cards can be represented to the guest as if they were a new piece of hardware, using an interface optimized for the hypervisor. These PV drivers are operating system and (often) hypervisor specific. Use of PV drivers can speed up performance by an order of magnitude, and are also a key determinant to performance.

Why use a Hypervisor?
Hypervisors make it possible to use more of a system’s available resources, and provide greater IT mobility, since the guest VMs are independent of the host hardware. This means they can be easily moved between different servers.

14. How does the emulation is different from VMs?  

. Emulation, in short, involves making one system imitate another. For example, if a piece of software runs on system A and not on system B, we make system B “emulate” the working of system A. The software then runs on an emulation of system A.
In this same example, virtualization would involve taking system A and splitting it into two servers, B and C. Both of these “virtual” servers are independent software containers, having their own access to software based resources – CPU, RAM, storage and networking – and can be rebooted independently. They behave exactly like real hardware, and an application or another computer would not be able to tell the difference.
Each of these technologies have their own uses, benefits and shortcomings.

Definition
Emulation
In our emulation example, software fills in for hardware – creating an environment that behaves in a hardware-like manner. This takes a toll on the processor by allocating cycles to the emulation process – cycles that would instead be utilized executing calculations. Thus, a large part of the CPU muscle is expended in creating this environment.
Interestingly enough, you can run a virtual server in an emulated environment. So, if emulation is such a waste of resources, why consider it?
Emulation can be effectively utilized in the following scenarios:
Running an operating system meant for other hardware (e.g., Mac software on a PC; console-based games on a computer)
Running software meant for another operating system (running Mac-specific software on a PC and vice versa)
Running legacy software after comparable hardware become obsolete
Emulation is also useful when designing software for multiple systems. The coding can be done on a single machine, and the application can be run in emulations of multiple operating systems, all running simultaneously in their own windows.
 Virtualization
In our virtualization example, we can safely say that it utilizes computing resources in an efficient, functional manner – independent of their physical location or layout. A fast machine with ample RAM and sufficient storage can be split into multiple servers, each with a pool of resources. That single machine, ordinarily deployed as a single server, could then host a company’s web and email server. Computing resources that were previously underutilized can now be used to full potential. This can help drastically cut down costs.


15. Compare and contrast the VMs and containers/dockers, indicating their advantages and disadvantages?

 advantages of virtual machines:
  • Multiple OS environments can exist simultaneously on the same machine, isolated from each other;
  • Virtual machine can offer an instruction set architecture that differs from real computer's;
  • Easy maintenance, application provisioning, availability and convenient recovery.
Benefits of Containers
  • Reduced IT management resources
  • Reduced size of snapshots
  • Quicker spinning up apps
  • Reduced & simplified security updates
  • Less code to transfer, migrate, upload workloads
 

 
 
 
 
 
 


Comments

Popular posts from this blog

jQuery

Introduction to The Frameworks

Tutorial 04 – Distributed systems