Author’s note:
I’m
interested in storage. There are
fascinating advances in storage technology that deserve at least a high level
overview. This Disaster Recovery 101
series is my attempt to “think out loud” about these advances and how they
impact the storage landscape at large.
Let’s face it: tape
is on its way out. Until recently, there
really was no other reasonable alternative to robotic tape libraries stacked
with rows upon rows of neatly cleaned and serviced tape backup units. The cost per gigabyte of storage was less
than competing technology from the disk and optical manufacturers. It just
worked; day-in, day-out. Sure, it had
its quirks -- the sequential writing mangled mailbox recovery operations from
days to weeks to months and having to offsite tapes for disaster recovery?
Well, the global manager of IT was still carrying home tapes for the weekend
so how safe and secure was the data?
All said and told, however, tape was still king.
Disaster recovery has always been an important part of
information lifecycle management. In the
post-9/11 haze, it became readily apparent that DR needed to be fast,
efficient and scalable. Not every
company could pay for an OC-48 pipe to a remote site. There needed to be more
cost effective means of data protection and recovery. As an example, American Express’ office in
New York City was able to failover in less than 5 minutes after the towers
collapsed due to their DR planning and implementation. If this had been any other scenario, chances
are millions of users and their data would have been unavailable for days if
not weeks. By implementing a comprehensive
DR plan coupled with secured storage and backups, American Express was able to
limit their liability. So, what does
this have to do with tape?
We’ve seen the cost of disk storage drop dramatically. At current price points, a 1TB drive can be
had for $399.00 which represents $0.03 a megabyte. A standard 400GB uncompressed LTO-3 tape at $40.00
represents $0.01 a megabyte. While
pricing parity hasn’t been reached, it’s awfully close. To offset the price differential, the inherent
performance differences become powerful motivators of DR. Try doing a full mailbox restore on tape
versus disk. See my point? While 2 cents a megabyte difference can add
up over time, trying to pay your IT staff overtime for fixing the Global
Manager of IT’s cock-up with his mailbox will quickly erase that. Cutting restore times from weeks to minutes
is no small feat. The mechanism(s) behind
this? Disk Libraries using Virtual Tape Library (VTL) imaging.
What exactly is VTL?
According to BitPipe, VTL is defined as:
Virtual tape is an archival
storage technology that makes it possible to save data as if it were
being stored on tape although it may actually be stored on hard disk
or on another storage medium.
VTL is typically made up of a hardware/software solution
that combines a disk-based target (EMC EDL 2xx, 4000, 6000 series, NetApp
NearStor VTL, et al) with a software initiator (such as EMC Networker,
DataDirect Networks VTLS, et al).
Typical enterprise backup solutions (Veritas NetBackup, etc.) also
support most of the features that are rolled into the VTL package, though they
are manufacturer agnostic. The actual
disk library hardware emulates common tape drives (Quantum, HP, IBM, for
example) and formats, and can generate multiple thousands of tape “images” and hundreds of virtual tape
libraries viewable by backup software.
All this is well and good, but how is the data protected by the array?
Within the DL systems, RAID groups are typically assigned to
the drives within the given arrays. EMC’s
EDL series, for example, uses RAID 3
within their arrays to provide data protection.
Additionally, most hardware manufacturers provide fail-over of their
front-end storage processors in addition to redundant back-end fibre
loops. Additional protection can be
added by dumping data from disk to tape for offsite storage (which will be the
subject of another DR article) if no site to site replication is implemented. Another additional factor to take into
consideration is the ability of the Disk Libraries to multistream, that is, to process
backup data into multiple backup sets using two or more devices concurrently
without slowing or widening the backup window.
Why VTL?
Now that we have a high level overview of VTL technologies,
the big question remains: Why VTL? We’ve
already seen that pricing parity has almost been reached and that some of the
larger performance issues (reads/writes, restore times, etc.) have been assuaged by disk libraries. Additionally, having more granular control
over backup windows and targets with the ability to do multistreaming much more
efficiently than tape is an additional value-add. Finally, the capacity to write to disk THEN
dump to tape allows the backup window to be small while final off-site backup to
tape can be done during production without interruption.
Infrastructure
Considerations
Obviously, there can be significant infrastructure changes
that accompany a switch to a DL backup mechanism, but, by taking into
consideration some of the tools designed
to manage stale data you can focus your energies and backups on data that is
constantly in flux or more dynamic. In a
future post, I’ll examine some of these concerns in more depth. For now, new hardware cost, re-allocation of
rack/server space, re-zoning, etc. all are important considerations in the
implementation of disk-based backup.
Additional considerations need to center on further dump to tape and off-site storage.
All told, VTL and its associated technologies promise to promote quicker DR times and provide a much-need shot in the arm for backup technology.