Tutorial 02 – Industry Practices and Tools
1. What is the need for VCS?
Version Control
Benefits. Version control systems allow you to compare files,
identify differences, and merge the changes if needed prior to committing any
code. Versioning is also a great way to keep track of application builds by
being able to identify which version is currently in development, QA,
and production.
2.
Differentiate the three models of VCSs, stating their pros and cons?
Version Control is essential to
development, even if you're working by yourself because it protects you from
yourself. If you make a mistake, it's a simple matter to rollback to a previous
version of your code that you know works. This also frees you to explore and
experiment with your code because you're free of having to worry about whether
what you're doing is reversible or not. There are two major branches of Version
Control Systems (VCS), Centralized and Distributed.
Centralized VCS are based on using a
central server, where everyone "checks out" a project, works on it,
and "commits" their changes back to the server for anybody else to
use. The major Centralized VCS are CVS and SVN. Both have been heavily
criticized because "merging" "branches" is extremely
painful with them.
Distributed VCS let everyone have
their own server, where you can "pull" changes from other people and
"push" changes to a server. The most common Distributed VCS are Git
and Mercurial. [TODO: write more on Distributed VCS]
If you're working on a project I
heavily recommend using a distributed VCS. I recommend Git because it's
blazingly fast, but is has been criticized as being too hard to use. If you
don't mind using a commercial product BitKeeper is supposedly easy to use.
3. Git and GitHub, are they same or
different? Discuss with facts?
Git is a distributed
version control system that allows developers to track changes in files and
work with other developers. It was developed in 2005 by Linus Torvalds, the
creator of Linux.
Git stands apart from other SCRs because of the approach to
working with data. Most other systems store information in the form of a list
of changes in the files. Instead, Git’s approach to storing data is more like a
set of snapshots of a miniature file system. Every time you save the state of
your project in Git, the system remembers how each file looks at this moment
and saves a link to this snapshot. So in case if you messed up with your code
and ctrl+Z does not work, Git allows you to revert files to a previous state
you can even revert the entire project back, compare changes over time, see who
last modified something that might be causing a problem, who introduced an
issue and when. Git it actually does not have to have any server. Through git
technology, you can make your own local machine as a source code repository not
required to have a centralized server
GitHub is an
online hosting service for repositories. It has all the functions of
distributed version control and source control functionality — everything that
supports Git and even more. GitHub allows you to share repositories, access
other’s repositories, store remote copies of your repositories (GitHub servers)
as a backup of your local copies. Developers use GitHub in conjunction with Git
as it allows to save their code online.
GitHub lets developers interact with each other in
different projects. GitHub also boasts access control, bug tracking, task
management and a wiki for each project. The goal of GitHub is to promote
developer interaction., it was called a sort of Facebook for developers, in
Facebook people share posts and pictures, in Github developers share code
- Git is a revision control system, a tool to manage your source code history.
- GitHub is a hosting service for Git repositories.
- So they are not the same thing: Git is the tool, GitHub is the service for projects that use Git.
4. Compare and contrast the Git
commands, commit and push?
Basically
git commit
"records changes to the repository"
while git push
"updates remote refs along with associated
objects". So the first one is used in connection with your
local repository, while the latter one is used to interact with a remote
repository.
5. Discuss the use of staging area and Git directory?
Staging
is a step before the commit process in git. That is, a commit in git is
performed in two steps: staging and actual commit. As long as a changeset is in
the staging area, git allows you to edit it as you like (replace staged files
with other versions of staged files, remove changes from staging, etc.)
Staging area
Staging area is a temp schema used to
Do flat mapping i.e. dumping all the OLTP data in to it
without applying any business rules. Pushing data into staging will take less
time because there are no business rules or transformation applied on it.
Used for data cleansing and validation using First Logic
A staging area is like a large table with data separated
from their sources to be loaded into a data warehouse in the required format.
If we attempt to load data directly from OLTP, it might mess up the OLTP
because of format changes between a warehouse and OLTP. Keeping the OLTP data
intact is very important for both the OLTP and the warehouse.
According to the
complexity of the business rule, we may require staging area, the basic need of
staging area is to clean the OLTP source data and gather in a place. Basically
it’s a temporary database area. Staging area data is used for the further
process and after that they can be deleted
6. Explain the collaboration workflow
of Git, with example?
A Git Workflow is a recipe or recommendation for how
to use Git to accomplish work in a consistent and productive manner. Git
workflows encourage users to leverage Git effectively and consistently. Git
offers a lot of flexibility in how users manage changes. Given Git's focus on
flexibility, there is no standardized process on how to interact with Git. When
working with a team on a Git managed project, it’s important to make sure the
team is all in agreement on how the flow of changes will be applied. To ensure
the team is on the same page, an agreed upon Git workflow should be developed
or selected. There are several publicized Git workflows that may be a good
fit for your team. Here, we’ll be discussing some of these workflow options.
7. Discuss the benefits of CDNs?
•Improving
website load times-By distributing content closer to website visitors by using
a nearby CDN server (among other optimizations), visitors experience faster
page loading times. As visitors are more inclined to click away from a
slow-loading site, a CDN can reduce bounce rates and increase the amount of
time that people spend on the site. In other words, a faster a website means
more visitors will stay and stick around longer.
•Reducing bandwidth costs-Bandwidth consumption
costs for website hosting is a primary expense for websites. Through caching
and other optimizations, CDNs are able to reduce the amount of data an origin
server must provide, thus reducing hosting costs for website owners.
•Increasing content availability and redundancy-Large amounts of traffic or
hardware failures can interrupt normal website function. Thanks to their
distributed nature, a CDN can handle more traffic and withstand hardware
failure better than many origin servers.
•Improving
website security-A CDN may improve security by providing DDoSmitigation,
improvements to security certificates, and other optimizations.
8. How CDNs differ from web hosting servers?
- Web Hosting is used to host your website on a server and let users access it over the internet. A content delivery network is about speeding up the access/delivery of your website’s assets to those users.
- Traditional web hosting would deliver 100% of your content to the user. If they are located across the world, the user still must wait for the data to be retrieved from where your web server is located. A CDN takes a majority of your static and dynamic content and serves it from across the globe, decreasing download times. Most times, the closer the CDN server is to the web visitor, the faster assets will load for them.
- Web Hosting normally refers to one server. A content delivery network refers to a global network of edge servers which distributes your content from a multi-host environment.
9. Identify free and commercial CDNs?
- CloudFlare. CloudFlare is popularly known as the best free CDN for WordPress users. ...
- Incapsula. Incapsula provides Application Delivery from the cloud: Global CDN, Website Security, DDoS Protection, Load Balancing & Failover. ...
- Photon by Jetpack. ...
- Swarmify.
10. Discuss the requirements for
virtualization?
CPU
The three elements to consider when selecting virtualization hardware include the CPU, memory, and network I/O capacity. They're all critical for workload consolidation.Issues with the CPU pertain to either clock speed or the number of cores held by the CPU. Please don't run out and buy the market's fastest CPU. Instead, buy one with more modest clock speed and a greater number of cores.
You'll receive better consolidation from two CPUs with 2.4 GHz and 10 cores than you will from two CPUs with 3 GHz and 4 cores. Invest in faster CPUs only when your workload demands it. The best server for virtualization will include CPUs with large internal caches.
Memory
Your virtual machine resides in memory. The more memory you have, the greater your consolidation. You need at least enough DDR3 memory to support the number of workloads you run on the system.Take the 10-core example above. The two 10 core CPUs would support 40 threads of potential workloads. We derive this number from adding the number of cores (20 total). Then we multiply the result by 2 because each core has two threads.
If each workload uses 2 GB, your server would need at least 80 GB. The closest binary equivalent would be either 96 GB. Anything less would compromise your consolidation or your performance.
Anything more would just be a waste of money.
It's worth noting that memory resilience features require extra memory modules. They won't add to your available memory pool. Save these features for your servers that run mission-critical workloads.
Network Access
Be sure you have adequate bandwidth available.Consider upgrading your network interface to a quad port NIC. You may even install a 10 GbE NIC if your workload demands justify it.
Common 1 GbE network interface cards just won't cut it. Get rid of them and set up more rigorous network access.
11. Discuss and compare the pros and
cons of different virtualization techniques in different levels?
Virtualization has several benefits. For
businesses with limited funds, virtualization helps them stay on budget by
eliminating the need to invest in tons of hardware. Creating virtual
environments to work in also helps businesses with limited IT staff automate
routine tasks and centralize resource management.
Further, employees can access their data anytime, anywhere, using any device.
However, virtualized environments have drawbacks. Here are the major pros and
cons of virtualization.
Pro:
Reduced IT costs
Virtualization helps businesses
reduce costs in several ways, according to Mike Adams, senior director of cloud
platform product marketing at VMware.
- Capital expenditure savings. Virtualization lets companies reduce their IT costs by requiring fewer hardware servers and related resources to achieve the same level of computing performance, availability and scalability.
- Operational expenditure savings. Once servers are virtualized, your IT staff can greatly reduce the ongoing administration and management of manual, time-consuming processes by automating operations, thus resulting in lower operational expenses.
- Data center and energy-efficiency savings. As companies reduce the size of their hardware and server footprint, they lower their energy consumption, cooling power and data center square footage, thus resulting in lower costs. [Read related story: Best Virtualization Solutions for Small Businesses]
Con:
The upfront costs are hefty
If you're transitioning a legacy
system to a virtualized one, upfront costs are likely to be expensive. Be
prepared to spend upwards of $10,000 for the servers and software licenses.
However, as virtualization technology improves and becomes more commonplace,
costs will go down D
Pro:
Efficient resource utilization
Virtualization enables businesses to
get the most out of their investment in hardware and resources. "As
customer data center environments grow in size and complexity, managing it
becomes a burden," Adams said. "Virtualization can greatly help
reduce this complexity by offering resource management capabilities to help
increase efficiencies in these virtual environments."
In contrast, traditional
infrastructures that use multiple servers don't make the most out of their
setups. "Many of those servers would typically not utilize more than 2 to
10 percent of the server hardware resources," said John Livesay, vice
president of Infranet Technologies, a network infrastructure
services provider. "With virtualization, we can now run multiple virtual
servers on a single virtual host [and make] better use of the resources
available."
Con:
Not all hardware or software can be virtualized
The drawback, however, is that not
all servers and applications are virtualization-friendly, Livesay said.
"Typically, the main reason you may not virtualize a server or application
is only because the application vendor may not support it yet, or recommend
it," he said.
But virtualization is highly
scalable. It lets businesses easily create additional resources as required by
many applications, such as by easily adding extra servers – it's all done
on-demand on an as-needed basis, without any significant investments in time or
money.
IT admins can create new servers
quickly, because they do not need to purchase new hardware each time they need
a new server, Livesay said. "If the resources are available, we can create
a new server in a few clicks of a mouse," he added.
The ease of creating additional
resources also helps businesses scale as they grow. "This scenario might
be good for small businesses that are growing quickly, or businesses using
their data center for testing and
development," Livesay said.
Businesses should keep in mind,
though, that one of the main goals and advantages of virtualization is the
efficient use of resources. Therefore, they should be careful not to let the
effortlessness of creating servers result in the carelessness of allocating
resources.
"Server sprawl is one of the
unintended consequences of virtualization," Livesay said. "Once
administrators realize how easy it is to add new servers, they start adding a
new server for everything. Soon you find that instead of six to 10 servers, you
are now managing 20 to 30 servers."
The limitations virtualization faces
include a lack of awareness that certain applications or workloads can be
virtualized, according to Adams.
"Workloads such as Hadoop,
NoSQL databases, Spark and containers often start off on bare-metal hardware
but present new opportunities to be virtualized later on," Adams said.
"Virtualization can now support many new applications and workloads within
the first 60 to 90 days on the market."
Although more software applications
are adapting to virtualization situations, there can be licensing complications
due to multiple hosts and migrations. Regarding performance and licensing
issues, it's prudent to check if certain essential applications work well in a
virtualized environment.
12. Identify popular implementations
and available tools for each level of visualization?
Visualizing the tree line using solar panels
Tool-R
Calculating the Age of the Universe
Tool Used: R
Rendering the Moon using Earth’s Colors
Tool Used: Python
The World Seen Through 17,000 Travel Itineraries
Tools Used: Tableau, Gephy
13. What is the hypervisor and what
is the role of it?
. Hypervisor, also
known as a virtual machine monitor, is a process that creates and runs virtual
machines (VMs). A hypervisor allows one host computer to support multiple guest
VMs by virtually sharing its resources, like memory and processing. Generally,
there are two types of hypervisors. Type 1 hypervisors, called “bare metal,”
run directly on the host’s hardware. Type 2 hypervisors, called “hosted,” run
as a software layer on an operating system, like other computer programs.
One of the key functions a hypervisor provides is isolation,
meaning that a guest cannot affect the operation of the host or any other guest,
even if it crashes. As such, the hypervisor must carefully emulate the hardware
of a physical machine, and (except under carefully controlled circumstances),
prevent access by a guest to the real hardware. How the hypervisor does this is
a key determinant of virtual machine performance. But because emulating real
hardware can be slow, hypervisors often provide special drivers, so called
‘paravirtualized drivers’ or ‘PV drivers’, such that virtual disks and network
cards can be represented to the guest as if they were a new piece of hardware,
using an interface optimized for the hypervisor. These PV drivers are operating
system and (often) hypervisor specific. Use of PV drivers can speed up
performance by an order of magnitude, and are also a key determinant to
performance.
Why use a Hypervisor?
Hypervisors make it possible to use more of a system’s
available resources, and provide greater IT mobility, since the guest VMs are
independent of the host hardware. This means they can be easily moved between
different servers.
14. How does the emulation is
different from VMs?
. Emulation, in short, involves making one system imitate
another. For example, if a piece of software runs on system A and not on system
B, we make system B “emulate” the working of system A. The software then runs
on an emulation of system A.
In this same example, virtualization would involve taking
system A and splitting it into two servers, B and C. Both of these “virtual”
servers are independent software containers, having their own access to
software based resources – CPU, RAM, storage and networking – and can be
rebooted independently. They behave exactly like real hardware, and an
application or another computer would not be able to tell the difference.
Each of these technologies have their own uses, benefits and
shortcomings.
Definition
Emulation
In our emulation example,
software fills in for hardware – creating an environment that behaves in a
hardware-like manner. This takes a toll on the processor by allocating cycles
to the emulation process – cycles that would instead be utilized executing
calculations. Thus, a large part of the CPU muscle is expended in creating this
environment.
Interestingly enough, you can run
a virtual server in an emulated environment. So, if emulation is such a waste
of resources, why consider it?
Emulation can be effectively
utilized in the following scenarios:
Running an operating system meant
for other hardware (e.g., Mac software on a PC; console-based games on a
computer)
Running software meant for
another operating system (running Mac-specific software on a PC and vice versa)
Running legacy software after
comparable hardware become obsolete
Emulation is also useful when
designing software for multiple systems. The coding can be done on a single
machine, and the application can be run in emulations of multiple operating
systems, all running simultaneously in their own windows.
Virtualization
In our virtualization example, we
can safely say that it utilizes computing resources in an efficient, functional
manner – independent of their physical location or layout. A fast machine with
ample RAM and sufficient storage can be split into multiple servers, each with
a pool of resources. That single machine, ordinarily deployed as a single
server, could then host a company’s web and email server. Computing resources
that were previously underutilized can now be used to full potential. This can
help drastically cut down costs.
15. Compare and contrast the VMs and
containers/dockers, indicating their advantages and disadvantages?
advantages of virtual machines:
- Multiple OS environments can exist simultaneously on the same machine, isolated from each other;
- Virtual machine can offer an instruction set architecture that differs from real computer's;
- Easy maintenance, application provisioning, availability and convenient recovery.
Benefits of Containers
- Reduced IT management resources
- Reduced size of snapshots
- Quicker spinning up apps
- Reduced & simplified security updates
- Less code to transfer, migrate, upload workloads
Comments
Post a Comment