VLANs Antivirus Fault Tolerance and Disaster Recovery

· Virtual LANs

· VLAN Membership

· Protocol-based VLANs

· Port-based VLANs

· MAC Address based VLANs

· Viruses, Virus Solutions, and Malicious Software

· Trojans, Worms, Spyware, and Hoaxes

· Protecting Computers from Viruses

· Fault Tolerance

· Hard Disks Are Half the Problem

· Disk-level Fault Tolerance

· RAID 0: Stripe Set Without Parity

· RAID 1

· RAID 5

· RAID 10

· Server and Services Fault Tolerance

· Stand-by Servers

· Server Clustering

· Link Redundancy

· Using Uninterruptible Power Supplies

· Why Use a UPS?

· Power Threats

· Disaster Recovery

· Full Backup

· Differential Backup

· Incremental Backup

· Tape Rotations

· Backup Best Practices

· Hot and Cold Spares

· Hot Spare and Hot Swapping

· Cold Spare and Cold Swapping

· Hot, Warm, and Cold Sites

VLANs, Antivirus, Fault Tolerance, and Disaster Recovery

As far as network administration goes, nothing is more important than fault tolerance and disaster recovery. First and foremost, it is the responsibility of the network administrator to safeguard the data held on the servers and to ensure that when requested, this data is ready to go.

Because both fault tolerance and disaster recovery are such an important part of network administration. In that light, this tutorial is important both in terms of real-world application.

Before diving into the fault tolerant and disaster recovery objectives, we will start this tutorial by reviewing the function of virtual LANs (VLANS).

Virtual LANs

To understand VLANs, it is first necessary to have a basic understanding of how a traditional LAN operates. A standard local area network (LAN) uses hardware such as hubs, bridges, and switches in the same physical segment to provide a connection point for all end node devices. All network nodes are capable of communicating with each other without the need for a router; however, communications with devices on other LAN segments does require the use of a router.

As a network grows, routers are used to expand the network. The routers provide the capability to connect separate LANs and to isolate users into broadcast and collision domains. Using routers to route data around the network and between segments increases latency. Latency refers to delays in transmission caused by the routing process.

Virtual LANs (VLANs) provide an alternate method to segment a network and in the process, significantly increase the performance capability of the network, and remove potential performance bottlenecks. A VLAN is a group of computers that are connected and act as if they are on their own physical network segments, even though they might not be. For instance, suppose that you work in a three-story building in which the advertising employees are spread over all three floors. A VLAN can let all the advertising personnel use the network resources as if they were connected on the same segment. This virtual segment can be isolated from other network segments. In effect, it would appear to the advertising group that they were on a network by themselves.

VLANs offer some clear advantages. Being able to create logical segmentation of a network gives administrators flexibility beyond the restrictions of the physical network design and cable infrastructure. VLANs allow for easier administration because the network can be divided into well-organized sections. Further, you can increase security by isolating certain network segments from others. For instance, you can segment the marketing personnel from finance or the administrators from the students. VLANs can ease the burden on overworked routers and reduce broadcast storms. Table 1 summarizes the benefits of VLANs.

Table 1 Benefits of VLANs
Advantages	Description
Increased security	By creating logical (virtual) boundaries, network segments can be isolated.
Increased performance	By reducing broadcast traffic throughout the network, VLANs free up bandwidth.
Organization	Network users and resources that are linked and communicate frequently can be grouped together in a VLAN.
Simplified administration	With a VLAN, the network administrator's job is easier when moving users between LAN segments, re-cabling, addressing new stations, and reconfiguring hubs and routers.

VLAN Membership

You can use several methods to determine VLAN membership or how devices are assigned to a specific VLAN. The following sections describe the common methods of determining how VLAN membership is assigned.

Protocol-based VLANs

With protocol-based VLAN membership, computers are assigned to VLANs by using the protocol that is in use and the Layer 3 address. For example, this method enables an Internetwork Packet Exchange (IPX) network or a particular Internet Protocol (IP) subnet to have its own VLAN.

It is important to note that although VLAN membership might be based on Layer 3 information, this has nothing to do with routing or routing functions. The IP numbers are used only to determine the membership in a particular VLAN not to determine routing.

Port-based VLANs

Port-based VLANs require that specific ports on a network switch be assigned to a VLAN. For example, ports 1 through 8 might be assigned to marketing, ports 9 through 18 might be assigned to sales, and so on. Using this method, a switch determines VLAN membership by taking note of the port used by a particular packet. Figure 1 shows an example of a port-based VLAN.

Figure 1 Port-based VLAN configuration.

MAC Address-based VLANs

As you might have guessed, the Media Access Control (MAC) address type of VLAN assigns membership according to the MAC address of the workstation. To do this, the switch must keep track of the MAC addresses that belong to each VLAN. The advantage of this method is that a workstation computer can be moved anywhere in an office without needing to be reconfigured; because the MAC address does not change, the workstation remains a member of a particular VLAN. Table 2 provides examples of MAC address based VLANs.

Table 2 MAC Address based VLANs
MAC Address	VLAN	Description
44-45-53-54-00-00	1	Sales
44-45-53-54-13-12	2	Marketing
44-45-53-54-D3-01	3	Administration
44-45-53-54-F5-17	1	Sales

Although the acceptance and implementation of VLANs has been slow, the ability to logically segment a LAN provides a new level of administrative flexibility, organization, and security.

Viruses, Virus Solutions, and Malicious Software

Viruses, spyware, worms, and other malicious code are an unfortunate part of modern computing. In today's world, an unprotected computer is at high risk of having some form of malicious software installed on the system: A protected system is still at risk; the risk is just lower.

By definition, a virus is a program that is self-replicating and operates on a computer system without the user's knowledge. These viruses will either attach to or replace system files, system executables, and data files. Once in, the virus can perform many different functions. It might completely consume system resources making the system basically too slow to use, it might completely corrupt and down a computer, or it might compromise data integrity and availability.

In order to be considered a virus, the malicious code must meet two criteria: It must be self-replicating, and it must be capable of executing itself. Three common virus types are listed below:

· Boot sector virus Boot sector viruses target the boot record of hard disks or floppy disks. In order to boot, floppy disks or hard drives contain an initial set of instructions that start the boot process. Boot sector viruses infect this program and activate when the system boots. This enables the virus to stay hidden in memory and operate in the background.

· File viruses Very common are the file viruses. File viruses attack applications and program files. This type of virus often targets the .exe, .com, and .bat by either destroying them, preventing applications to run, or by modifying them and using them to propagate the virus.

· Macro viruses the actual data such as documents, spreadsheets, and so on represents the most important and irreplaceable elements on a computer system. Macro viruses are designed to attack documents and files and therefore are particularly nasty.

Trojans, Worms, Spyware, and Hoaxes

There are other forms of malicious programs, which by definition are not a virus but still threaten our computer systems.

Trojan horse is a program that appears harmless or even helpful, but after being executed performs an undesirable and malicious action. For instance, a Trojan horse can be a program advertised to be a patch, harmless application such as a calculator or a product upgrade or enhancement. The trick is to fool the user to download and install the program. Once executed, the Trojan horse can perform the function it was actually designed to do. This might include crashing a system, stealing data, and corrupting data.

Worms are similar to viruses in that they replicate, but they do not require a host file to spread from system to system. The difference between viruses and worms is that a worm does not attach itself to an executable program as do viruses: A worm is self-contained and does not need to be part of another program to propagate itself. This makes a worm capable of replicating at incredible speeds. This can cause significant network slowdowns as the worm spreads.

A worm can do any number of malicious actions, including deleting files and sending documents via email without the user knowing. A worm can also carry another program designed to open a backdoor in the system used by spam senders to send junk mail and notices to a computer. Once this backdoor access is open to the computer, your system, it is vulnerable and open to data theft, modification, or worse.

Spyware is a new threat that can be very hidden and easy to get. Spyware is designed to monitor activity on a computer, such as Web surfing activity, and send that information to a remote source. It is commonly installed along with a free program that might have been downloaded.

Spyware detection software is becoming increasingly popular and given the information that can be stolen, should be considered an important part of a secure system.

One final consideration is that of virus hoaxes. The threat of virus activity is very real, and, as such, we are alerted to it. Some take advantage of this to create elaborate virus hoaxes. Hoaxes will often pop up on the computer screen or arrive in the email warning of a virus or claiming that your system has contracted a virus. These are more annoying than dangerous but serve to confuse and complicate the virus issue.

Protecting Computers from Viruses

The threat from malicious code is a very real concern. We need to take the steps to protect our systems, and although it might not be possible to eliminate the threat, it is possible to significantly reduce the threat.

One of the primary tools used in the fight against malicious software is antivirus software. Antivirus software is available from a number of companies, and each offers similar features and capabilities. The following is a list of the common features and characteristics of antivirus software.

· Real-time protection an installed antivirus program should continuously monitor the system looking for viruses. If a program is downloaded, an application opened, or a suspicious email received, the real-time virus monitor will detect and remove the threat. The virus application will sit in the background and will be largely unnoticed by the user.

· Virus scanning an antivirus program must be capable of scanning selected drives and disk either locally or remotely. Scans can either be run manually, or they can be scheduled to run at a particular time.

· Scheduling it is a best practice to schedule virus scanning to occur automatically at a predetermined time. In a network environment, this would typically occur off hours when the overhead of the scanning process won't impact users.

· Live updates new viruses and malicious software are released with alarming frequency. It is recommended that the antivirus software be configured to receive virus updates regularly.

· Email vetting Emails represent one of the primary sources for virus delivery. It is essential to use antivirus software that provides email scanning for both inbound and outbound email.

· Centralized management if used in a network environment, it is a good idea to use software that supports centralized management of the virus program from the server. Virus updates and configurations only need to be made on the server and not on each individual client station.

Software is only part of the solution in a proactive virus solution. A complete virus protection strategy requires many aspects to help limit the risk of viruses and includes the following:

· Develop in-house policies and rules In a corporate environment or even a small office, it is important to establish what information can be placed onto a system. For example, should users be able to download programs from the Internet? Can users bring in their own floppy disks or other storage media?

· Monitoring virus threats with new viruses coming out all the time, it is important to check to see if new viruses have been released and what they are designed to do.

· Educate users One of the keys to a complete antivirus solution is to train users in virus prevention and recognition techniques. If users know what they are looking for, it can prevent a virus from entering the system or the network.

· Backup copies of important documents it should be mentioned that no solution is absolute and care should be taken to ensure that the data is backed up. In the event of a malicious attack, redundant information is available in a secure location.

· Automate virus scanning and updates Today's antivirus software can be configured to scan and update itself automatically. Because such tasks can be forgotten and overlooked, it is recommended to have these processes scheduled to run at predetermined times.

· Email vetting Email is one of the commonly used virus delivery mechanisms. Antivirus software can be used to check inbound and outbound emails for virus activity.

Fault Tolerance

As far as computers are concerned, fault tolerance refers to the capability of the computer system or network to provide continued data availability in the event of hardware failure. Every component within a server, from CPU fan to power supply, has a chance of failure. Some components such as processors rarely fail, whereas hard disk failures are well documented.

Almost every component has fault-tolerant measures. These measures typically require redundant hardware components that can easily or automatically take over when there is a hardware failure.

Of all the components inside computer systems, the one that requires the most redundancy are the hard disks. Not only are hard disk failures more common than any other component but they also maintain the data, without which there would be little need for a network.

Hard Disks Are Half the Problem

In fact, according to recent research, hard disks are responsible for one of every two server hardware failures. This is an interesting statistic to think about.

Disk-level Fault Tolerance

Making the decision to have hard disk fault tolerance on the server is the first step; the second is deciding which fault-tolerant strategy to use.

Hard disk fault tolerance is implemented according to different RAID (redundant array of inexpensive disks) levels. Each RAID level offers differing amounts of data protection and performance.

The RAID level appropriate for a given situation depends on the importance placed on the data, the difficulty of replacing that data, and the associated costs of a respective RAID implementation. Oftentimes, the cost of data loss and replacement outweigh the costs associated with implementing a strong RAID fault-tolerant solution.

RAID 0: Stripe Set without Parity

Although it's given RAID status, RAID 0 does not actually provide any fault tolerance; in fact, using RAID 0 might even be less fault tolerant than storing all of your data on a single hard disk.

RAID 0 combines unused disk space on two or more hard drives into a single logical volume with data being written to equally sized stripes across all the disks. By using multiple disks, reads and writes are performed simultaneously across all drives. This means that disk access is faster, making the performance of RAID 0 better than other RAID solutions and significantly better than a single hard disk. The downside of RAID 0 is that if any disk in the array fails, the data is lost and must be restored from backup.

Because of its lack of fault tolerance, RAID 0 is rarely implemented. Figure 2 shows an example of RAID 0 striping across three hard disks.

Figure 2 RAID 0 striping without parity.

RAID 1

One of the more common RAID implementations is RAID 1. RAID 1 requires two hard disks and uses disk mirroring to provide fault tolerance. When information is written to the hard disk, it is automatically and simultaneously written to the second hard disk. Both of the hard disks in the mirrored configuration use the same hard disk controller; the partitions used on the hard disk need to be approximately the same size to establish the mirror. In the mirrored configuration, if the primary disk were to fail, the second mirrored disk would contain all the required information and there would be little disruption to data availability. RAID 1 ensures that the server will continue operating in the case of the primary disk failure.

There are some key advantages to a RAID 1 solution. First, it is cheap, as only two hard disks are required to provide fault tolerance. Second, no additional software is required for establishing RAID 1, as modern network operating systems have built-in support for it. RAID levels using striping are often incapable of including a boot or system partition in fault-tolerant solutions. Finally, RAID 1 offers load balancing over multiple disks, which increases read performance over that of a single disk. Write performance however is not improved.

Because of its advantages, RAID 1 is well suited as an entry-level RAID solution, but it has a few significant shortcomings that exclude its use in many environments. It has limited storage capacity two 100GB hard drives only provide 100GB of storage space. Organizations with large data storage needs can exceed a mirrored solutions capacity in very short order. RAID 1 also has a single point of failure, the hard disk controller. If it were to fail, the data would be inaccessible on either drive. Figure 3 shows an example of RAID 1 disk mirroring.

Figure 3 RAID 1 disk mirroring.

An extension of RAID 1 is disk duplexing. Disk duplexing is the same as mirroring with the exception of one key detail: It places the hard disks on separate hard disk controllers, eliminating the single point of failure.

RAID 5

RAID 5, also known as disk striping with parity, uses distributed parity to write information across all disks in the array. Unlike the striping used in RAID 0, RAID 5 includes parity information in the striping, which provides fault tolerance. This parity information is used to re-create the data in the event of a failure. RAID 5 requires a minimum of three disks with the equivalent of a single disk being used for the parity information. This means that if you have three 40GB hard disks, you have 80GB of storage space with the other 40GB used for parity. To increase storage space in a RAID 5 array, you need only add another disk to the array. Depending on the sophistication of the RAID setup you are using, the RAID controller will be able to incorporate the new drive into the array automatically, or you will need to rebuild the array and restore the data from backup.

Many factors have made RAID 5 a very popular fault-tolerant design. RAID 5 can continue to function in the event of a single drive failure. If a hard disk were to fail in the array, the parity would re-create the missing data and continue to function with the remaining drives. The read performance of RAID 5 is improved over a single disk.

There are only a few drawbacks for the RAID 5 solution. These are as follows:

The costs of implementing RAID 5 are initially higher than other fault-tolerant measures requiring a minimum of three hard disks. Given the costs of hard disks today, this is a minor concern.
RAID 5 suffers from poor write performance because the parity has to be calculated and then written across several disks. The performance lag is minimal and won't have a noticeable difference on the network.
When a new disk is placed in a failed RAID 5 array, there is a regeneration time when the data is being rebuilt on the new drive. This process requires extensive resources from the server.

Figure 4 shows an example of RAID 5 striping with parity.

Figure 9.4. RAID 5 striping with parity.

RAID 10

Sometimes RAID levels are combined to take advantage of the best of each. One such strategy is RAID 10, which combines RAID levels 1 and 0. In this configuration, four disks are required. As you might expect, the configuration consists of a mirrored stripe set. To some extent, RAID 10 takes advantage of the performance capability of a stripe set while offering the fault tolerance of a mirrored solution. As well as having the benefits of each though, RAID 10 also inherits the shortcomings of each strategy. In this case, the high overhead and the decreased write performance are the disadvantages. Figure 5 shows an example of a RAID 10 configuration. Table 3 provides a summary of the various RAID levels.

Figure 5 Disks in a RAID 10 configuration.

Table 3 Summary of RAID Levels
RAID Level	Description	Advantages	Disadvantages	Required Disks
RAID 0	Disk striping	Increased read and write performance. RAID 0 can be implemented with only two disks.	Does not offer any fault tolerance.	Two or more
RAID 1	Disk mirroring	Provides fault tolerance. Can also be used with separate disk controllers, reducing the single point of failure (called disk duplexing).	RAID 1 has a 50% overhead and suffers from poor write performance.	Two
RAID 5	Disk striping with distributed parity	Can recover from a single disk failure; increased read performance over a poor write single disk. Disks can be added to the array to increase storage capacity.	May slow down network during regeneration time, and may suffer from performance	Minimum of three
RAID 10	Striping with mirrored volumes striping;	Increased performance with striping; offers mirrored fault tolerance.	High overhead as with mirroring.	Four

Server and Services Fault Tolerance

In addition to providing fault tolerance for individual hardware components, some organizations go the extra mile to include the entire server in the fault-tolerant design. Such a design keeps servers and the services they provide up and running. When it comes to server fault tolerance, two key strategies are commonly employed: stand-by servers and server clustering.

Stand-by Servers

Stand-by servers are a fault-tolerant measure in which a second server is configured identically to the first one. The second server can be stored remotely or locally and set up in a failover configuration. In a failover configuration, the secondary server is connected to the primary and ready to take over the server functions at a heartbeat's notice. If the secondary server detects that the primary has failed, it will automatically cut in. Network users will not notice the transition, as there will be little or no disruption in data availability.

The primary server communicates with the secondary server by issuing special notification notices referred to as heartbeats. If the secondary server stops receiving the heartbeat messages, it assumes that the primary has died and so assumes the primary server configuration.

Server Clustering

Those companies wanting maximum data availability that have the funds to pay for it can choose to use server clustering. As the name suggests, server clustering involves grouping servers together for the purposes of fault tolerance and load balancing. In this configuration, other servers in the cluster can compensate for the failure of a single server. The failed server will have no impact on the network, and the end users will have no idea that a server has failed.

The clear advantage of server clusters is that they offer the highest level of fault tolerance and data availability. The disadvantages are equally clearcost. The cost of buying a single server can be a huge investment for many organizations; having to buy duplicate servers is far too costly.

Link Redundancy

Although a failed network card might not actually stop the server or a system, it might as well. A network server that cannot be used on the network makes for server downtime. Although the chances of a failed network card are relatively low, our attempts to reduce the occurrence of downtime have led to the development of a strategy that provides fault tolerance for network connections.

Through a process called adapter teaming, groups of network cards are configured to act as a single unit. The teaming capability is achieved through software, either as a function of the network card driver or through specific application software. The process of adapter teaming is not widely implemented; though the benefits it offers are many, so it's likely to become a more common sight. The result of adapter teaming is increased bandwidth, fault tolerance, and the ability to manage network traffic more effectively. These features are broken down into three sections:

· Adapter fault tolerance The basic configuration enables one network card to be configured as the primary device and others as secondary. If the primary adapter fails, one of the other cards can take its place without the need for intervention. When the original card is replaced, it resumes the role of primary controller.

· Adapter load balancing Because software controls the network adapters, workloads can be distributed evenly among the cards so that each link is used to a similar degree. This distribution allows for a more responsive server because one card is not overworked while another is under worked.

· Link aggregation this provides vastly improved performance by allowing more than one network card's bandwidth to be aggregated combined into a single connection. For example, through link aggregation, four 100MBps network cards can provide a total of 400MBps bandwidth. Link aggregation requires that both the network adapters and the switch being used support it. In 1999, the IEEE ratified the 802.3ad standard for link aggregation, allowing compatible products to be produced.

Using Uninterruptible Power Supplies

No discussion of fault tolerance can be complete without a look at power-related issues and the mechanisms used to combat them. When you're designing a fault-tolerant system, your planning should definitely include UPSs (Uninterruptible Power Supplies). A UPS serves many functions and is a major part of server consideration and implementation.

On a basic level, a UPS is a box that holds a battery and a built-in charging circuit. During times of good power, the battery is recharged; when the UPS is needed, it's ready to provide power to the server. Most often, the UPS is required to provide enough power to give the administrator time to shut down the server in an orderly fashion, preventing any potential data loss from a dirty shutdown.

Why Use a UPS?

Organizations of all shapes and sizes need UPSs as part of their fault-tolerance strategies. A UPS is as important as any other fault-tolerance measure. Three key reasons make a UPS necessary:

· Data availability the goal of any fault-tolerance measure is data availability. A UPS ensures access to the server in the event of a power failure or at least as long as it takes to save a file.

· Protection from data loss Fluctuations in power or a sudden power down can damage the data on the server system. In addition, many servers take full advantage of caching, and a sudden loss of power could cause the loss of all information held in cache.

· Protection from hardware damage Constant power fluctuations or sudden power downs can damage hardware components within a computer. Damaged hardware can lead to reduced data availability while the hardware is being repaired.

Power Threats

In addition to keeping a server functioning long enough to safely shut it down, a UPS also safeguards a server from inconsistent power. This inconsistent power can take many forms. A UPS protects a system from the following power-related threats:

· Blackout a total failure of the power supplied to the server.

· Spike A spike is a very short (usually less than a second) but very intense increase in voltage. Spikes can do irreparable damage to any kind of equipment, especially computers.

· Surge Compared to a spike, a surge is a considerably longer (sometimes many seconds) but usually less intense increase in power. Surges can also damage your computer equipment.

· Sag A sag is a short-term voltage drop (the opposite of a spike). This type of voltage drop can cause a server to reboot.

· Brownout A brownout is a drop in voltage that usually lasts more than a few minutes.

Many of these power-related threats can occur without your knowledge; if you don't have a UPS, you cannot prepare for them. For the cost, it is worth buying a UPS, if for no other reason than to sleep better at night.

Disaster Recovery

Even the most fault-tolerant networks will fail, which is an unfortunate fact. When those costly and carefully implemented fault-tolerant strategies do fail, you are left with disaster recovery.

Disaster recovery can take on many forms. In addition to real disaster, fire, flood, theft, and the like, many other potential business disruptions can fall under the banner of disaster recovery. For example, the failure of the electrical supply to your city block might interrupt the business function. Such an event, although not a disaster per se, might invoke the disaster recovery methods.

The cornerstone of every disaster recovery strategy is the preservation and recoverability of data. When talking about preservation and recoverability, we are talking about backups. When we are talking about backups, we are likely talking about tape backups. Implementing a regular backup schedule can save you a lot of grief when fault tolerance fails or when you need to recover a file that has been accidentally deleted. When it comes time to design a backup schedule, there are three key types of backups that are used full, differential, and incremental.

Full Backup

The preferred method of backup is the full backup method, which copies all files and directories from the hard disk to the backup media. There are a few reasons why doing a full backup is not always possible. First among them is likely the time involved in performing a full backup.

Depending on the amount of data to be backed up, full backups can take an extremely long time and can use extensive system resources. Depending on the configuration of the backup hardware, this can slow down the network considerably. In addition, some environments have more data than can fit on a single tape. This makes taking a full backup awkward, as someone may need to be there to manually change the tapes.

The main advantage of full backups is that a single tape or tape set holds all the data you need backed up. In the event of a failure, a single tape might be all that is needed to get all data and system information back. The upshot of all this is that any disruption to the network is greatly reduced.

Unfortunately, its strength can also be its weakness. A single tape holding an organization's data can be a security risk. If the tape were to fall into the wrong hands, all the data can be restored on another computer. Using passwords on tape backups and using a secure offsite and onsite location can minimize the security risk.

Differential Backup

For those companies that just don't quite have enough time to complete a full backup daily, there is the differential backup. Differential backups are faster than a full backup, as they back up only the data that has changed since the last full backup. This means that if you do a full backup on a Saturday and a differential backup on the following Wednesday, only the data that has changed since Saturday is backed up. Restoring the differential backup will require the last full backup and the latest differential backup.

Differential backups know what files have changed since the last full backup by using a setting known as the archive bit. The archive bit flags files that have changed or been created and identifies them as ones that need to be backed up. Full backups do not concern themselves with the archive bit, as all files are backed up regardless of date. A full backup, however, will clear the archive bit after data has been backed up to avoid future confusion. Differential backups take notice of the archive bit and use it to determine which files have changed. The differential backup does not reset the archive bit information.

Incremental Backup

Some companies have a very finite amount of time they can allocate to backup procedures. Such organizations are likely to use incremental backups in their backup strategy. Incremental backups save only the files that have changed since the last full or incremental backup. Like differential backups, incremental backups use the archive bit to determine the files that have changed since the last full or incremental backup. Unlike differentials, however, incremental backups clear the archive bit, so files that have not changed are not backed up.

The faster backup times of incremental backups comes at a price the amount of time required to restore. Recovering from a failure with incremental backups requires numerous tapes all the incremental tapes and the most recent full backup. For example, if you had a full backup from Sunday and an incremental for Monday, Tuesday, and Wednesday, you would need four tapes to restore the data. Keep in mind: Each tape in the rotation is an additional step in the restore process and an additional failure point. One damaged incremental tape and you will be unable to restore the data. Table 4 summarizes the various backup strategies.

Table 4 Backup Strategies
Backup Type	Advantages	Disadvantages	Data Backed Up	Archive Bit
Full	Backs up all data on a single tape or tape set Restoring data. requires the least amount of tapes.	Depending on the amount of data, full backups can take a long time.	All files and directories are backed up.	Does not use the archive bit, but resets it after data has been backed up.
Differential	Faster backups than a full.	Uses more tapes than a full backup. Restore process takes longer than a full backup.	All files and directories that have changed since the last full or differential backup.	Uses the archive bit to determine the files that have changed, but does not reset the archive bit.
Incremental	Faster backup times.	Requires multiple disks; restoring data takes more time than the other backup methods.	The files and directories that have changed since the last full or incremental backup.	Uses the archive bit to determine the files that have changed, and resets the archive bit.

Tape Rotations

After you have decided on the backup type you will use, you are ready to choose a backup rotation. Several backup rotation strategies are in use some good, some bad, and some really bad. The most common, and perhaps the best, rotation strategy is the Grandfather, Father, Son rotation (GFS).

The GFS backup rotation is the most widely used and for good reason. An example GFS rotation may require 12 tapes: four tapes for daily backups (son), five tapes for weekly backups (father), and three tapes for monthly backups (grandfather).

Using this rotation schedule, it is possible to recover data from days, weeks, or months previous. Some network administrators choose to add tapes to the monthly rotation to be able to retrieve data even further back, sometimes up to a year. In most organizations, however, data that is a week old is out of date, let alone six months or a year.

Backup Best Practices

Many details go into making a backup strategy a success. The following list contains issues to consider as part of your backup plan.

· Offsite storage Consider having backup tapes stored offsite so that in the event of a disaster in a building, a current set of tapes is still available offsite. The offsite tapes should be as current as any onsite and should be secure.

· Label tapes the goal is to restore the data as quickly as possible, and trying to find the tape you need can be difficult if not marked. Further, it can prevent you from recording over a tape you need.

· New tapes like old cassette tapes, the tape cartridges used for the backups wear out over time. One strategy used to prevent this from becoming a problem is to introduce new tapes periodically into the rotation schedule.

· Verify backups never assume that the backup was successful. Seasoned administrators know that checking backup logs and performing periodic test restores are parts of the backup process.

· Cleaning From time to time, it is necessary to clean the tape drive. If the inside gets dirty, backups can fail.

Hot and Cold Spares

The impact that a failed component has on a system or network depends largely on the pre-disaster preparation and on the recovery strategies used. Hot and cold spares represent a strategy for recovering from failed components.

Hot Spare and Hot Swapping

Hot spares give system administrators the ability to quickly recover from component failure another mechanism to deal with component failure. In a common use, a hot spare enables a RAID system to automatically failover to a spare hard drive should one of the other drives in the RAID array fail. A hot spare does not require any manual intervention rather; a redundant drive resides in the system at all times, just waiting to take over if another drive fails. The hot spare drive will take over automatically, leaving the failed drive to be removed at a later time. Even though hot-spare technology adds an extra level of protection to your system, after a drive has failed and the hot spare has been used, the situation should be remedied as soon as possible.

Hot swapping is the ability to replace a failed component while the system is running. Perhaps the most commonly identified hot-swap component is the hard drive. In certain RAID configurations, when a hard drive crashes, hot swapping allows you simply to take the failed drive out of the server and install a new one.

The benefits of hot swapping are very clear in that it allows a failed component to be recognized and replaced without compromising system availability. Depending on the system's configuration, the new hardware will normally be recognized automatically by both the current hardware and the operating system. Nowadays, most internal and external RAID subsystems support the hot-swapping feature. Some hot-swappable components include power supplies and hard disks.

Cold Spare and Cold Swapping

The term cold spare refers to a component, such as a hard disk, that resides within a computer system but requires manual intervention in case of component failure. A hot spare will engage automatically, but a cold spare might require configuration settings or some other action to engage it. A cold spare configuration will typically require a reboot of the system.

The term cold spare has also been used to refer to a redundant component that is stored outside the actual system but is kept in case of component failure. To replace the failed component with a cold spare, the system would need to be powered down.

Cold swapping refers to replacing components only after the system is completely powered off. This strategy is by far the least attractive for servers because the services provided by the server will be unavailable for the duration of the cold-swap procedure. Modern systems have come a long way to ensure that cold swapping is a rare occurrence. For some situations and for some components, however, cold swapping is the only method to replace a failed component. The only real defense against having to shut down the server is to have redundant components residing in the system.

Hot, Warm, and Cold Sites

A disaster recovery plan might include the provision for a recovery site that can be brought quickly into play. These sites fall into three categories: hot, warm, and cold. The need for each of these types of sites depends largely on the business you are in and the funds available. Disaster recovery sites represent the ultimate in precautions for organizations that really need it. As a result, they don't come cheap.

The basic concept of a disaster recovery site is that it can provide a base from which the company can be operated during a disaster. The disaster recovery site is not normally intended to provide a desk for every employee, but is intended more as a means to allow key personnel to continue the core business function.

In general, a cold recovery site is a site that can be up and operational in a relatively short time span, such as a day or two. Provision of services, such as telephone lines and power, is taken care of, and the basic office furniture might be in place, but there is unlikely to be any computer equipment, even though the building might well have a network infrastructure and a room ready to act as a server room. In most cases, cold sites provide the physical location and basic services.

Cold sites are useful if there is some forewarning of a potential problem. Generally speaking, cold sites are used by organizations that can weather the storm for a day or two before they get back up and running. If you are the regional office of a major company, it might be possible to have one of the other divisions take care of business until you are ready to go; but if you are the one and only office in the company, you might need something a little hotter.

For organizations with the dollars and the desire, hot recovery sites represent the ultimate in fault-tolerance strategies. Like cold recovery sites, hot sites are designed to provide only enough facilities to continue the core business function, but hot recovery sites are set up to be ready to go at a moment's notice.

A hot recovery site will include phone systems with the phone lines already connected. Data networks will also be in place, with any necessary routers and switches plugged in and turned on. Desks will have desktop PCs installed and waiting, and server areas will be replete with the necessary hardware to support business-critical functions. In other words, within a few hours, the hot site can become a fully functioning element of an organization.

The issue that confronts potential hot-recovery site users is simply that of cost. Office space is expensive at the best of times, but having space sitting idle 99.9 percent of the time can seem like a tremendously poor use of money. A very popular strategy to get around this problem is to use space provided in a disaster recovery facility, which is basically a building, maintained by a third-party company, in which various businesses rent space. Space is apportioned, usually, on how much each company pays.

Sitting in between the hot and cold recovery sites is the warm site. A warm site will typically have computers but not configured ready to go. This means that data might need to be upgraded or other manual interventions might need to be performed before the network is again operational. The time it takes to get a warm site operational lands right in the middle of the other two options, as does the cost.

Search

Sunday, 24 February 2019

Brief description of VLANs Antivirus Fault Tolerance and Disaster Recovery