VLANs Antivirus Fault Tolerance and Disaster Recovery
·
Virtual LANs
·
VLAN Membership
·
Protocol-based VLANs
·
Port-based VLANs
·
Viruses, Virus Solutions, and Malicious Software
·
Trojans, Worms, Spyware, and Hoaxes
·
Protecting Computers from Viruses
·
Fault Tolerance
·
Hard Disks Are Half the Problem
·
Disk-level Fault Tolerance
·
RAID 0: Stripe Set Without Parity
·
RAID 1
·
RAID 5
·
RAID 10
·
Server and Services Fault Tolerance
·
Stand-by Servers
·
Server Clustering
·
Link Redundancy
·
Using Uninterruptible Power Supplies
·
Why Use a UPS?
·
Power Threats
·
Disaster Recovery
·
Full Backup
·
Differential Backup
·
Incremental Backup
·
Tape Rotations
·
Backup Best Practices
·
Hot and Cold Spares
·
Hot Spare and Hot Swapping
·
Cold Spare and Cold Swapping
·
Hot, Warm, and Cold Sites
VLANs, Antivirus, Fault
Tolerance, and Disaster Recovery
As far as network administration goes, nothing is more important
than fault tolerance and disaster recovery. First and foremost, it is the
responsibility of the network administrator to safeguard the data held on the
servers and to ensure that when requested, this data is ready to go.
Because both fault tolerance and disaster recovery are such an
important part of network administration. In that light, this tutorial is
important both in terms of real-world application.
Before diving into the fault tolerant and disaster recovery
objectives, we will start this tutorial by reviewing the function of virtual
LANs (VLANS).
Virtual LANs
To understand VLANs, it is first necessary to have a basic
understanding of how a traditional LAN operates. A standard local area network
(LAN) uses hardware such as hubs, bridges, and switches in the same physical
segment to provide a connection point for all end node devices. All network
nodes are capable of communicating with each other without the need for a
router; however, communications with devices on other LAN segments does require
the use of a router.
As a network grows, routers are used to expand the network. The
routers provide the capability to connect separate LANs and to isolate users
into broadcast and collision domains. Using routers to route data around the
network and between segments increases latency. Latency refers to delays in
transmission caused by the routing process.
Virtual LANs (VLANs) provide an alternate method to segment
a network and in the process, significantly increase the performance capability
of the network, and remove potential performance bottlenecks. A VLAN is a group
of computers that are connected and act as if they are on their own physical
network segments, even though they might not be. For instance, suppose that you
work in a three-story building in which the advertising employees are spread
over all three floors. A VLAN can let all the advertising personnel use the
network resources as if they were connected on the same segment. This virtual
segment can be isolated from other network segments. In effect, it would appear
to the advertising group that they were on a network by themselves.
VLANs offer some clear advantages. Being able to create logical
segmentation of a network gives administrators flexibility beyond the
restrictions of the physical network design and cable infrastructure. VLANs
allow for easier administration because the network can be divided into
well-organized sections. Further, you can increase security by isolating
certain network segments from others. For instance, you can segment the
marketing personnel from finance or the administrators from the students. VLANs
can ease the burden on overworked routers and reduce broadcast storms. Table 1
summarizes the benefits of VLANs.
Table
1 Benefits of VLANs
|
|
Advantages
|
Description
|
Increased security
|
By creating logical (virtual)
boundaries, network segments can be isolated.
|
Increased performance
|
By reducing broadcast traffic
throughout the network, VLANs free up bandwidth.
|
Organization
|
Network users and resources that
are linked and communicate frequently can be grouped together in a VLAN.
|
Simplified administration
|
With a VLAN, the network
administrator's job is easier when moving users between LAN segments, re-cabling,
addressing new stations, and reconfiguring hubs and routers.
|
VLAN Membership
You can use several methods to determine VLAN membership or how
devices are assigned to a specific VLAN. The following sections describe the
common methods of determining how VLAN membership is assigned.
Protocol-based VLANs
With protocol-based VLAN membership, computers are assigned to
VLANs by using the protocol that is in use and the Layer 3 address. For
example, this method enables an Internetwork Packet Exchange (IPX) network or a
particular Internet Protocol (IP) subnet to have its own VLAN.
It is important to note that although VLAN membership might be
based on Layer 3 information, this has nothing to do with routing or routing
functions. The IP numbers are used only to determine the membership in a
particular VLAN not to determine routing.
Port-based VLANs
Port-based VLANs require that specific ports on a network switch
be assigned to a VLAN. For example, ports 1 through 8 might be assigned to
marketing, ports 9 through 18 might be assigned to sales, and so on. Using this
method, a switch determines VLAN membership by taking note of the port used by
a particular packet. Figure 1 shows an example of a port-based VLAN.
Figure 1 Port-based VLAN configuration.
MAC Address-based VLANs
As you might have guessed, the Media Access Control (MAC) address
type of VLAN assigns membership according to the MAC address of the
workstation. To do this, the switch must keep track of the MAC addresses that
belong to each VLAN. The advantage of this method is that a workstation
computer can be moved anywhere in an office without needing to be reconfigured;
because the MAC address does not change, the workstation remains a member of a
particular VLAN. Table 2 provides examples of MAC address based VLANs.
Table
2 MAC Address based VLANs
|
||
MAC Address
|
VLAN
|
Description
|
44-45-53-54-00-00
|
1
|
Sales
|
44-45-53-54-13-12
|
2
|
Marketing
|
44-45-53-54-D3-01
|
3
|
Administration
|
44-45-53-54-F5-17
|
1
|
Sales
|
Although the acceptance and implementation of VLANs has been slow,
the ability to logically segment a LAN provides a new level of administrative
flexibility, organization, and security.
Viruses, Virus Solutions, and
Malicious Software
Viruses, spyware, worms, and other malicious code are an
unfortunate part of modern computing. In today's world, an unprotected computer
is at high risk of having some form of malicious software installed on the
system: A protected system is still at risk; the risk is just lower.
By definition, a virus is a program that is self-replicating and
operates on a computer system without the user's knowledge. These viruses will
either attach to or replace system files, system executables, and data files.
Once in, the virus can perform many different functions. It might completely
consume system resources making the system basically too slow to use, it might
completely corrupt and down a computer, or it might compromise data integrity
and availability.
In order to be considered a virus, the malicious code must meet
two criteria: It must be self-replicating, and it must be capable of executing
itself. Three common virus types are listed below:
·
Boot sector virus Boot sector viruses target the boot record of
hard disks or floppy disks. In order to boot, floppy disks or hard drives
contain an initial set of instructions that start the boot process. Boot sector
viruses infect this program and activate when the system boots. This enables
the virus to stay hidden in memory and operate in the background.
·
File viruses Very common are the file viruses. File viruses attack
applications and program files. This type of virus often targets the .exe,
.com, and .bat by either destroying them, preventing applications to run, or by
modifying them and using them to propagate the virus.
·
Macro viruses the actual data such as documents, spreadsheets, and
so on represents the most important and irreplaceable elements on a computer
system. Macro viruses are designed to attack documents and files and therefore
are particularly nasty.
Trojans, Worms, Spyware, and
Hoaxes
There are other forms of malicious programs, which by definition
are not a virus but still threaten our computer systems.
Trojan horse is a program that appears harmless or even helpful,
but after being executed performs an undesirable and malicious action. For
instance, a Trojan horse can be a program advertised to be a patch, harmless
application such as a calculator or a product upgrade or enhancement. The trick
is to fool the user to download and install the program. Once executed, the
Trojan horse can perform the function it was actually designed to do. This
might include crashing a system, stealing data, and corrupting data.
Worms are similar to viruses in that they replicate, but they do
not require a host file to spread from system to system. The difference between
viruses and worms is that a worm does not attach itself to an executable
program as do viruses: A worm is self-contained and does not need to be part of
another program to propagate itself. This makes a worm capable of replicating
at incredible speeds. This can cause significant network slowdowns as the worm
spreads.
A worm can do any number of malicious actions, including deleting
files and sending documents via email without the user knowing. A worm can also
carry another program designed to open a backdoor in the system used by spam
senders to send junk mail and notices to a computer. Once this backdoor access
is open to the computer, your system, it is vulnerable and open to data theft,
modification, or worse.
Spyware is a new threat that can be very hidden and easy to get.
Spyware is designed to monitor activity on a computer, such as Web surfing
activity, and send that information to a remote source. It is commonly
installed along with a free program that might have been downloaded.
Spyware detection software is becoming increasingly popular and
given the information that can be stolen, should be considered an important
part of a secure system.
One final consideration is that of virus hoaxes. The threat of
virus activity is very real, and, as such, we are alerted to it. Some take
advantage of this to create elaborate virus hoaxes. Hoaxes will often pop up on
the computer screen or arrive in the email warning of a virus or claiming that
your system has contracted a virus. These are more annoying than dangerous but
serve to confuse and complicate the virus issue.
Protecting Computers from Viruses
The threat from malicious code is a very real concern. We need to
take the steps to protect our systems, and although it might not be possible to
eliminate the threat, it is possible to significantly reduce the threat.
One of the primary tools used in the fight against malicious
software is antivirus software. Antivirus software is available from a number
of companies, and each offers similar features and capabilities. The following
is a list of the common features and characteristics of antivirus software.
·
Real-time protection an installed antivirus program should
continuously monitor the system looking for viruses. If a program is
downloaded, an application opened, or a suspicious email received, the
real-time virus monitor will detect and remove the threat. The virus application
will sit in the background and will be largely unnoticed by the user.
·
Virus scanning an antivirus program must be capable of scanning
selected drives and disk either locally or remotely. Scans can either be run
manually, or they can be scheduled to run at a particular time.
·
Scheduling it is a best practice to schedule virus scanning to
occur automatically at a predetermined time. In a network environment, this
would typically occur off hours when the overhead of the scanning process won't
impact users.
·
Live updates new viruses and malicious software are released with
alarming frequency. It is recommended that the antivirus software be configured
to receive virus updates regularly.
·
Email vetting Emails represent one of the primary sources for
virus delivery. It is essential to use antivirus software that provides email
scanning for both inbound and outbound email.
·
Centralized management if used in a network environment, it is a
good idea to use software that supports centralized management of the virus program
from the server. Virus updates and configurations only need to be made on the
server and not on each individual client station.
Software is only part of the solution in a proactive virus
solution. A complete virus protection strategy requires many aspects to help
limit the risk of viruses and includes the following:
·
Develop in-house policies and rules In a corporate environment or
even a small office, it is important to establish what information can be
placed onto a system. For example, should users be able to download programs
from the Internet? Can users bring in their own floppy disks or other storage
media?
·
Monitoring virus threats with new viruses coming out all the time,
it is important to check to see if new viruses have been released and what they
are designed to do.
·
Educate users One of the keys to a complete antivirus solution is
to train users in virus prevention and recognition techniques. If users know
what they are looking for, it can prevent a virus from entering the system or
the network.
·
Backup copies of important documents it should be mentioned that
no solution is absolute and care should be taken to ensure that the data is
backed up. In the event of a malicious attack, redundant information is
available in a secure location.
·
Automate virus scanning and updates Today's antivirus software can
be configured to scan and update itself automatically. Because such tasks can
be forgotten and overlooked, it is recommended to have these processes
scheduled to run at predetermined times.
·
Email vetting Email is one of the commonly used virus delivery
mechanisms. Antivirus software can be used to check inbound and outbound emails
for virus activity.
Fault Tolerance
As far as computers are concerned, fault tolerance refers to the
capability of the computer system or network to provide continued data
availability in the event of hardware failure. Every component within a server,
from CPU fan to power supply, has a chance of failure. Some components such as
processors rarely fail, whereas hard disk failures are well documented.
Almost every component has fault-tolerant measures. These measures
typically require redundant hardware components that can easily or
automatically take over when there is a hardware failure.
Of all the components inside computer systems, the one that
requires the most redundancy are the hard disks. Not only are hard disk
failures more common than any other component but they also maintain the data,
without which there would be little need for a network.
Hard Disks Are Half the Problem
In fact, according to recent research, hard disks are responsible
for one of every two server hardware failures. This is an interesting statistic
to think about.
Disk-level Fault Tolerance
Making the decision to have hard disk fault tolerance on the server
is the first step; the second is deciding which fault-tolerant strategy to use.
Hard disk fault tolerance is implemented according to different RAID (redundant array of
inexpensive disks) levels. Each RAID level offers differing amounts of data
protection and performance.
The RAID level appropriate for a given situation depends on the
importance placed on the data, the difficulty of replacing that data, and the
associated costs of a respective RAID implementation. Oftentimes, the cost of
data loss and replacement outweigh the costs associated with implementing a
strong RAID fault-tolerant solution.
RAID 0: Stripe Set without Parity
Although it's given RAID status, RAID 0 does not actually provide
any fault tolerance; in fact, using RAID 0 might even be less fault tolerant
than storing all of your data on a single hard disk.
RAID 0 combines unused disk space on two or more hard drives into
a single logical volume with data being written to equally sized stripes across
all the disks. By using multiple disks, reads and writes are performed
simultaneously across all drives. This means that disk access is faster, making
the performance of RAID 0 better than other RAID solutions and significantly
better than a single hard disk. The downside of RAID 0 is that if any disk in
the array fails, the data is lost and must be restored from backup.
Because of its lack of fault tolerance, RAID 0 is rarely
implemented. Figure 2 shows an example of RAID 0 striping across three hard
disks.
Figure 2 RAID 0 striping
without parity.
RAID 1
One of the more common RAID implementations is RAID 1. RAID 1
requires two hard disks and uses disk mirroring to provide fault tolerance.
When information is written to the hard disk, it is automatically and
simultaneously written to the second hard disk. Both of the hard disks in the
mirrored configuration use the same hard disk controller; the partitions used
on the hard disk need to be approximately the same size to establish the
mirror. In the mirrored configuration, if the primary disk were to fail, the
second mirrored disk would contain all the required information and there would
be little disruption to data availability. RAID 1 ensures that the server will
continue operating in the case of the primary disk failure.
There are some key advantages to a RAID 1 solution. First, it is
cheap, as only two hard disks are required to provide fault tolerance. Second,
no additional software is required for establishing RAID 1, as modern network
operating systems have built-in support for it. RAID levels using striping are
often incapable of including a boot or system partition in fault-tolerant
solutions. Finally, RAID 1 offers load balancing over multiple disks, which
increases read performance over that of a single disk. Write performance however
is not improved.
Because of its advantages, RAID 1 is well suited as an entry-level
RAID solution, but it has a few significant shortcomings that exclude its use
in many environments. It has limited storage capacity two 100GB hard drives
only provide 100GB of storage space. Organizations with large data storage
needs can exceed a mirrored solutions capacity in very short order. RAID 1 also
has a single point of failure, the hard disk controller. If it were to fail,
the data would be inaccessible on either drive. Figure 3 shows an example of
RAID 1 disk mirroring.
Figure 3 RAID 1 disk
mirroring.
An extension of RAID 1 is disk duplexing. Disk duplexing is the
same as mirroring with the exception of one key detail: It places the hard
disks on separate hard disk controllers, eliminating the single point of
failure.
RAID 5
RAID 5, also known as disk striping with parity,
uses distributed parity to write information across all disks in the array.
Unlike the striping used in RAID 0, RAID 5 includes parity information in the
striping, which provides fault tolerance. This parity information is used to
re-create the data in the event of a failure. RAID 5 requires a minimum of
three disks with the equivalent of a single disk being used for the parity
information. This means that if you have three 40GB hard disks, you have 80GB
of storage space with the other 40GB used for parity. To increase storage space
in a RAID 5 array, you need only add another disk to the array. Depending on
the sophistication of the RAID setup you are using, the RAID controller will be
able to incorporate the new drive into the array automatically, or you will
need to rebuild the array and restore the data from backup.
Many factors have made RAID 5 a very popular fault-tolerant
design. RAID 5 can continue to function in the event of a single drive failure.
If a hard disk were to fail in the array, the parity would re-create the
missing data and continue to function with the remaining drives. The read
performance of RAID 5 is improved over a single disk.
There are only a few drawbacks for the RAID 5 solution. These are
as follows:
- The costs of
implementing RAID 5 are initially higher than other fault-tolerant
measures requiring a minimum of three hard disks. Given the costs of hard
disks today, this is a minor concern.
- RAID 5 suffers
from poor write performance because the parity has to be calculated and
then written across several disks. The performance lag is minimal and
won't have a noticeable difference on the network.
- When a new disk
is placed in a failed RAID 5 array, there is a regeneration time when the
data is being rebuilt on the new drive. This process requires extensive
resources from the server.
Figure 4 shows an example of RAID 5 striping with parity.
Figure 9.4. RAID 5 striping with parity.
RAID 10
Sometimes RAID levels are combined to take advantage of the best
of each. One such strategy is RAID 10, which combines RAID levels 1 and 0. In
this configuration, four disks are required. As you might expect, the configuration
consists of a mirrored stripe set. To some extent, RAID 10 takes advantage of
the performance capability of a stripe set while offering the fault tolerance
of a mirrored solution. As well as having the benefits of each though, RAID 10
also inherits the shortcomings of each strategy. In this case, the high
overhead and the decreased write performance are the disadvantages. Figure 5
shows an example of a RAID 10 configuration. Table 3 provides a summary of the
various RAID levels.
Figure 5 Disks in a RAID
10 configuration.
Table 3 Summary of
RAID Levels
|
||||
| RAID Level |
Description |
Advantages |
Disadvantages |
Required Disks |
| RAID 0 | Disk striping | Increased read and write performance. RAID 0 can be implemented with only two disks. | Does not offer any fault tolerance. | Two or more |
| RAID 1 | Disk mirroring | Provides fault tolerance. Can also be used with separate disk controllers, reducing the single point of failure (called disk duplexing). | RAID 1 has a 50% overhead and suffers from poor write performance. | Two |
| RAID 5 | Disk striping with distributed parity | Can recover from a single disk failure; increased read performance over a poor write single disk. Disks can be added to the array to increase storage capacity. | May slow down network during regeneration time, and may suffer from performance | Minimum of three |
| RAID 10 | Striping with mirrored volumes striping; | Increased performance with striping; offers mirrored fault tolerance. | High overhead as with mirroring. | Four |
Server and Services Fault
Tolerance
In addition to providing fault tolerance for individual hardware
components, some organizations go the extra mile to include the entire server
in the fault-tolerant design. Such a design keeps servers and the services they
provide up and running. When it comes to server fault tolerance, two key
strategies are commonly employed: stand-by servers and server clustering.
Stand-by Servers
Stand-by servers are a fault-tolerant measure in which a second
server is configured identically to the first one. The second server can be
stored remotely or locally and set up in a failover configuration. In a
failover configuration, the secondary server is connected to the primary and
ready to take over the server functions at a heartbeat's notice. If the
secondary server detects that the primary has failed, it will automatically cut
in. Network users will not notice the transition, as there will be little or no
disruption in data availability.
The primary server communicates with the secondary server by
issuing special notification notices referred to as heartbeats. If the
secondary server stops receiving the heartbeat messages, it assumes that the
primary has died and so assumes the primary server configuration.
Server Clustering
Those companies wanting maximum data availability that have the
funds to pay for it can choose to use server clustering. As the name suggests,
server clustering involves grouping servers together for the purposes of fault
tolerance and load balancing. In this configuration, other servers in the
cluster can compensate for the failure of a single server. The failed server
will have no impact on the network, and the end users will have no idea that a
server has failed.
The clear advantage of server clusters is that they offer the
highest level of fault tolerance and data availability. The disadvantages are
equally clearcost. The cost of buying a single server can be a huge investment
for many organizations; having to buy duplicate servers is far too costly.
Link Redundancy
Although a failed network card might not actually stop the server
or a system, it might as well. A network server that cannot be used on the
network makes for server downtime. Although the chances of a failed network
card are relatively low, our attempts to reduce the occurrence of downtime have
led to the development of a strategy that provides fault tolerance for network
connections.
Through a process called adapter teaming, groups of network cards
are configured to act as a single unit. The teaming capability is achieved
through software, either as a function of the network card driver or through
specific application software. The process of adapter teaming is not widely
implemented; though the benefits it offers are many, so it's likely to become a
more common sight. The result of adapter teaming is increased bandwidth, fault
tolerance, and the ability to manage network traffic more effectively. These
features are broken down into three sections:
·
Adapter fault tolerance The basic configuration enables one
network card to be configured as the primary device and others as secondary. If
the primary adapter fails, one of the other cards can take its place without
the need for intervention. When the original card is replaced, it resumes the
role of primary controller.
·
Adapter load balancing Because software controls the network
adapters, workloads can be distributed evenly among the cards so that each link
is used to a similar degree. This distribution allows for a more responsive
server because one card is not overworked while another is under worked.
·
Link aggregation this provides vastly improved performance by
allowing more than one network card's bandwidth to be aggregated combined into
a single connection. For example, through link aggregation, four 100MBps
network cards can provide a total of 400MBps bandwidth. Link aggregation
requires that both the network adapters and the switch being used support it.
In 1999, the IEEE ratified the 802.3ad standard for link aggregation, allowing
compatible products to be produced.
Using Uninterruptible Power
Supplies
No discussion of fault tolerance can be complete without a look at
power-related issues and the mechanisms used to combat them. When you're
designing a fault-tolerant system, your planning should definitely include UPSs
(Uninterruptible Power Supplies). A UPS serves many functions and is a major
part of server consideration and implementation.
On a basic level, a UPS is a box that holds a battery and a
built-in charging circuit. During times of good power, the battery is
recharged; when the UPS is needed, it's ready to provide power to the server.
Most often, the UPS is required to provide enough power to give the
administrator time to shut down the server in an orderly fashion, preventing
any potential data loss from a dirty shutdown.
Why Use a UPS?
Organizations of all shapes and sizes need UPSs as part of their
fault-tolerance strategies. A UPS is as important as any other fault-tolerance
measure. Three key reasons make a UPS necessary:
·
Data availability the goal of any fault-tolerance measure is data
availability. A UPS ensures access to the server in the event of a power failure
or at least as long as it takes to save a file.
·
Protection from data loss Fluctuations in power or a sudden power
down can damage the data on the server system. In addition, many servers take
full advantage of caching, and a sudden loss of power could cause the loss of
all information held in cache.
·
Protection from hardware damage Constant power fluctuations or
sudden power downs can damage hardware components within a computer. Damaged
hardware can lead to reduced data availability while the hardware is being
repaired.
Power Threats
In addition to keeping a server functioning long enough to safely
shut it down, a UPS also safeguards a server from inconsistent power. This
inconsistent power can take many forms. A UPS protects a system from the
following power-related threats:
·
Blackout a total failure of the power supplied to the server.
·
Spike A spike is a very short (usually less than a second) but
very intense increase in voltage. Spikes can do irreparable damage to any kind
of equipment, especially computers.
·
Surge Compared to a spike, a surge is a considerably longer
(sometimes many seconds) but usually less intense increase in power. Surges can
also damage your computer equipment.
·
Sag A sag is a short-term voltage drop (the opposite of a spike).
This type of voltage drop can cause a server to reboot.
·
Brownout A brownout is a drop in voltage that usually lasts more
than a few minutes.
Many of these power-related threats can occur without your
knowledge; if you don't have a UPS, you cannot prepare for them. For the cost,
it is worth buying a UPS, if for no other reason than to sleep better at night.
Disaster Recovery
Even the most fault-tolerant networks will fail, which is an unfortunate
fact. When those costly and carefully implemented fault-tolerant strategies do
fail, you are left with disaster recovery.
Disaster recovery can take on many forms. In addition to real
disaster, fire, flood, theft, and the like, many other potential business
disruptions can fall under the banner of disaster recovery. For example, the
failure of the electrical supply to your city block might interrupt the
business function. Such an event, although not a disaster per se, might invoke
the disaster recovery methods.
The cornerstone of every disaster recovery strategy is the
preservation and recoverability of data. When talking about preservation and
recoverability, we are talking about backups. When we are talking about
backups, we are likely talking about tape backups. Implementing a regular
backup schedule can save you a lot of grief when fault tolerance fails or when
you need to recover a file that has been accidentally deleted. When it comes
time to design a backup schedule, there are three key types of backups that are
used full, differential, and incremental.
Full Backup
The preferred method of backup is the full backup method, which
copies all files and directories from the hard disk to the backup media. There
are a few reasons why doing a full backup is not always possible. First among
them is likely the time involved in performing a full backup.
Depending on the amount of data to be backed up, full backups can
take an extremely long time and can use extensive system resources. Depending
on the configuration of the backup hardware, this can slow down the network
considerably. In addition, some environments have more data than can fit on a
single tape. This makes taking a full backup awkward, as someone may need to be
there to manually change the tapes.
The main advantage of full backups is that a single tape or tape
set holds all the data you need backed up. In the event of a failure, a single
tape might be all that is needed to get all data and system information back.
The upshot of all this is that any disruption to the network is greatly
reduced.
Unfortunately, its strength can also be its weakness. A single
tape holding an organization's data can be a security risk. If the tape were to
fall into the wrong hands, all the data can be restored on another computer.
Using passwords on tape backups and using a secure offsite and onsite location
can minimize the security risk.
Differential Backup
For those companies that just don't quite have enough time to
complete a full backup daily, there is the differential backup. Differential
backups are faster than a full backup, as they back up only the data that has
changed since the last full backup. This means that if you do a full backup on
a Saturday and a differential backup on the following Wednesday, only the data
that has changed since Saturday is backed up. Restoring the differential backup
will require the last full backup and the latest differential backup.
Differential backups know what files have changed since the last
full backup by using a setting known as the archive bit. The archive bit flags
files that have changed or been created and identifies them as ones that need
to be backed up. Full backups do not concern themselves with the archive bit,
as all files are backed up regardless of date. A full backup, however, will
clear the archive bit after data has been backed up to avoid future confusion.
Differential backups take notice of the archive bit and use it to determine
which files have changed. The differential backup does not reset the archive
bit information.
Incremental Backup
Some companies have a very finite amount of time they can allocate
to backup procedures. Such organizations are likely to use incremental backups
in their backup strategy. Incremental backups save only the files that have
changed since the last full or incremental backup. Like differential backups,
incremental backups use the archive bit to determine the files that have
changed since the last full or incremental backup. Unlike differentials,
however, incremental backups clear the archive bit, so files that have not
changed are not backed up.
The faster backup times of incremental backups comes at a price
the amount of time required to restore. Recovering from a failure with
incremental backups requires numerous tapes all the incremental tapes and the
most recent full backup. For example, if you had a full backup from Sunday and
an incremental for Monday, Tuesday, and Wednesday, you would need four tapes to
restore the data. Keep in mind: Each tape in the rotation is an additional step
in the restore process and an additional failure point. One damaged incremental
tape and you will be unable to restore the data. Table 4 summarizes the various
backup strategies.
Table 4 Backup
Strategies
|
||||
| Backup Type |
Advantages |
Disadvantages |
Data Backed Up |
Archive Bit |
| Full | Backs up all data on a single tape or tape set Restoring data. requires the least amount of tapes. | Depending on the amount of data, full backups can take a long time. | All files and directories are backed up. | Does not use the archive bit, but resets it after data has been backed up. |
| Differential | Faster backups than a full. | Uses more tapes than a full backup. Restore process takes longer than a full backup. | All files and directories that have changed since the last full or differential backup. | Uses the archive bit to determine the files that have changed, but does not reset the archive bit. |
| Incremental | Faster backup times. | Requires multiple disks; restoring data takes more time than the other backup methods. | The files and directories that have changed since the last full or incremental backup. | Uses the archive bit to determine the files that have changed, and resets the archive bit. |
Tape Rotations
After you have decided on the backup type you will use, you are
ready to choose a backup rotation. Several backup rotation strategies are in
use some good, some bad, and some really bad. The most common, and perhaps the
best, rotation strategy is the Grandfather, Father, Son rotation (GFS).
The GFS backup rotation is the most widely used and for good
reason. An example GFS rotation may require 12 tapes: four tapes for daily
backups (son), five tapes for weekly backups (father), and three tapes for
monthly backups (grandfather).
Using this rotation schedule, it is possible to recover data from
days, weeks, or months previous. Some network administrators choose to add
tapes to the monthly rotation to be able to retrieve data even further back,
sometimes up to a year. In most organizations, however, data that is a week old
is out of date, let alone six months or a year.
Backup Best Practices
Many details go into making a backup strategy a success. The
following list contains issues to consider as part of your backup plan.
·
Offsite storage Consider having backup tapes stored offsite so
that in the event of a disaster in a building, a current set of tapes is still
available offsite. The offsite tapes should be as current as any onsite and
should be secure.
·
Label tapes the goal is to restore the data as quickly as
possible, and trying to find the tape you need can be difficult if not marked.
Further, it can prevent you from recording over a tape you need.
·
New tapes like old cassette tapes, the tape cartridges used for
the backups wear out over time. One strategy used to prevent this from becoming
a problem is to introduce new tapes periodically into the rotation schedule.
·
Verify backups never assume that the backup was successful.
Seasoned administrators know that checking backup logs and performing periodic
test restores are parts of the backup process.
·
Cleaning From time to time, it is necessary to clean the tape
drive. If the inside gets dirty, backups can fail.
Hot and Cold Spares
The impact that a failed component has on a system or network
depends largely on the pre-disaster preparation and on the recovery strategies
used. Hot and cold spares represent a strategy for recovering from failed
components.
Hot Spare and Hot Swapping
Hot spares give system administrators the ability to quickly
recover from component failure another mechanism to deal with component
failure. In a common use, a hot spare enables a RAID system to automatically
failover to a spare hard drive should one of the other drives in the RAID array
fail. A hot spare does not require any manual intervention rather; a redundant
drive resides in the system at all times, just waiting to take over if another
drive fails. The hot spare drive will take over automatically, leaving the
failed drive to be removed at a later time. Even though hot-spare technology
adds an extra level of protection to your system, after a drive has failed and
the hot spare has been used, the situation should be remedied as soon as
possible.
Hot swapping is the ability to replace a failed component while
the system is running. Perhaps the most commonly identified hot-swap component
is the hard drive. In certain RAID configurations, when a hard drive crashes,
hot swapping allows you simply to take the failed drive out of the server and
install a new one.
The benefits of hot swapping are very clear in that it allows a
failed component to be recognized and replaced without compromising system
availability. Depending on the system's configuration, the new hardware will
normally be recognized automatically by both the current hardware and the
operating system. Nowadays, most internal and external RAID subsystems support
the hot-swapping feature. Some hot-swappable components include power supplies
and hard disks.
Cold Spare and Cold Swapping
The term cold spare refers to a component, such as a hard disk, that resides within a
computer system but requires manual intervention in case of component failure.
A hot spare will engage automatically, but a cold spare might require
configuration settings or some other action to engage it. A cold spare
configuration will typically require a reboot of the system.
The term cold spare has also been used to refer to a redundant
component that is stored outside the actual system but is kept in case of
component failure. To replace the failed component with a cold spare, the system
would need to be powered down.
Cold swapping refers to replacing components only after the system
is completely powered off. This strategy is by far the least attractive for
servers because the services provided by the server will be unavailable for the
duration of the cold-swap procedure. Modern systems have come a long way to
ensure that cold swapping is a rare occurrence. For some situations and for
some components, however, cold swapping is the only method to replace a failed
component. The only real defense against having to shut down the server is to
have redundant components residing in the system.
Hot, Warm, and Cold Sites
A disaster recovery plan might include the provision for a
recovery site that can be brought quickly into play. These sites fall into
three categories: hot, warm, and cold. The need for each of these types of
sites depends largely on the business you are in and the funds available.
Disaster recovery sites represent the ultimate in precautions for organizations
that really need it. As a result, they don't come cheap.
The basic concept of a disaster recovery site is that it can
provide a base from which the company can be operated during a disaster. The
disaster recovery site is not normally intended to provide a desk for every employee,
but is intended more as a means to allow key personnel to continue the core
business function.
In general, a cold recovery site is a site that can be up and
operational in a relatively short time span, such as a day or two. Provision of
services, such as telephone lines and power, is taken care of, and the basic
office furniture might be in place, but there is unlikely to be any computer
equipment, even though the building might well have a network infrastructure
and a room ready to act as a server room. In most cases, cold sites provide the
physical location and basic services.
Cold sites are useful if there is some forewarning of a potential
problem. Generally speaking, cold sites are used by organizations that can
weather the storm for a day or two before they get back up and running. If you
are the regional office of a major company, it might be possible to have one of
the other divisions take care of business until you are ready to go; but if you
are the one and only office in the company, you might need something a little
hotter.
For organizations with the dollars and the desire, hot recovery
sites represent the ultimate in fault-tolerance strategies. Like cold recovery
sites, hot sites are designed to provide only enough facilities to continue the
core business function, but hot recovery sites are set up to be ready to go at
a moment's notice.
A hot recovery site will include phone systems with the phone
lines already connected. Data networks will also be in place, with any
necessary routers and switches plugged in and turned on. Desks will have
desktop PCs installed and waiting, and server areas will be replete with the
necessary hardware to support business-critical functions. In other words,
within a few hours, the hot site can become a fully functioning element of an
organization.
The issue that confronts potential hot-recovery site users is
simply that of cost. Office space is expensive at the best of times, but having
space sitting idle 99.9 percent of the time can seem like a tremendously poor
use of money. A very popular strategy to get around this problem is to use
space provided in a disaster recovery facility, which is basically a building,
maintained by a third-party company, in which various businesses rent space.
Space is apportioned, usually, on how much each company pays.
Sitting in between the hot and cold recovery sites is the warm
site. A warm site will typically have computers but not configured ready to go.
This means that data might need to be upgraded or other manual interventions
might need to be performed before the network is again operational. The time it
takes to get a warm site operational lands right in the middle of the other two
options, as does the cost.

No comments:
Post a Comment