backtop


Print E-mail del.icio.us 17 comment(s) - last by oTAL.. on Jul 2 at 11:08 AM

There are fascinating advances in storage technology that deserve at least a high level overview

Author’s note:   I’m interested in storage.  There are fascinating advances in storage technology that deserve at least a high level overview.  This Disaster Recovery 101 series is my attempt to “think out loud” about these advances and how they impact the storage landscape at large.

Let’s face it:  tape is on its way out.  Until recently, there really was no other reasonable alternative to robotic tape libraries stacked with rows upon rows of neatly cleaned and serviced tape backup units.  The cost per gigabyte of storage was less than competing technology from the disk and optical manufacturers. It just worked; day-in, day-out.  Sure, it had its quirks -- the sequential writing mangled mailbox recovery operations from days to weeks to months and having to offsite tapes for disaster recovery? Well, the global manager of IT was still carrying home tapes for the weekend so how safe and secure was the data?  All said and told, however, tape was still king.

Disaster recovery has always been an important part of information lifecycle management.  In the post-9/11 haze, it became readily apparent that DR needed to be fast, efficient and scalable.  Not every company could pay for an OC-48 pipe to a remote site. There needed to be more cost effective means of data protection and recovery.  As an example, American Express’ office in New York City was able to failover in less than 5 minutes after the towers collapsed due to their DR planning and implementation.  If this had been any other scenario, chances are millions of users and their data would have been unavailable for days if not weeks.  By implementing a comprehensive DR plan coupled with secured storage and backups, American Express was able to limit their liability.  So, what does this have to do with tape?  

We’ve seen the cost of disk storage drop dramatically.  At current price points, a 1TB drive can be had for $399.00 which represents $0.03 a megabyte.  A standard 400GB uncompressed LTO-3 tape at $40.00 represents $0.01 a megabyte.  While pricing parity hasn’t been reached, it’s awfully close.  To offset the price differential, the inherent performance differences become powerful motivators of DR.  Try doing a full mailbox restore on tape versus disk.  See my point?  While 2 cents a megabyte difference can add up over time, trying to pay your IT staff overtime for fixing the Global Manager of IT’s cock-up with his mailbox will quickly erase that.  Cutting restore times from weeks to minutes is no small feat.  The mechanism(s) behind this? Disk Libraries using Virtual Tape Library (VTL) imaging.

What exactly is VTL?

According to BitPipe, VTL is defined as:

Virtual tape is an archival storage technology that makes it possible to save data as if it were being stored on tape although it may actually be stored on hard disk or on another storage medium.  

VTL is typically made up of a hardware/software solution that combines a disk-based target (EMC EDL 2xx, 4000, 6000 series, NetApp NearStor VTL, et al) with a software initiator (such as EMC Networker, DataDirect Networks VTLS, et al).  Typical enterprise backup solutions (Veritas NetBackup, etc.) also support most of the features that are rolled into the VTL package, though they are manufacturer agnostic.  The actual disk library hardware emulates common tape drives (Quantum, HP, IBM, for example) and formats, and can generate multiple thousands of tape  “images” and hundreds of virtual tape libraries viewable by backup software.  All this is well and good, but how is the data protected by the array?

Within the DL systems, RAID groups are typically assigned to the drives within the given arrays.  EMC’s EDL series, for example, uses RAID 3  within their arrays to provide data protection.  Additionally, most hardware manufacturers provide fail-over of their front-end storage processors in addition to redundant back-end fibre loops.  Additional protection can be added by dumping data from disk to tape for offsite storage (which will be the subject of another DR article) if no site to site replication is implemented.  Another additional factor to take into consideration is the ability of the Disk Libraries to multistream, that is, to process backup data into multiple backup sets using two or more devices concurrently without slowing or widening the backup window.

Why VTL?

Now that we have a high level overview of VTL technologies, the big question remains: Why VTL?  We’ve already seen that pricing parity has almost been reached and that some of the larger performance issues (reads/writes, restore times, etc.)  have been assuaged by disk libraries.  Additionally, having more granular control over backup windows and targets with the ability to do multistreaming much more efficiently than tape is an additional value-add.   Finally, the capacity to write to disk THEN dump to tape allows the backup window to be small while final off-site backup to tape can be done during production without interruption.

Infrastructure Considerations

Obviously, there can be significant infrastructure changes that accompany a switch to a DL backup mechanism, but, by taking into consideration some of  the tools designed to manage stale data you can focus your energies and backups on data that is constantly in flux or more dynamic.  In a future post, I’ll examine some of these concerns in more depth.  For now, new hardware cost, re-allocation of rack/server space, re-zoning, etc. all are important considerations in the implementation of disk-based backup.  Additional considerations need to center on further dump to tape and off-site storage.

All told, VTL and its associated technologies promise to promote quicker DR times and provide a much-need shot in the arm for backup technology.



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

No time soon
By masher2 (blog) on 6/28/2007 10:50:44 AM , Rating: 2
quote:
a 1TB drive can be had for $399.00 which represents $0.03 a megabyte. [LTO-3] at $40.00 represents $0.01 a megabyte. While pricing parity hasn’t been reached, it’s awfully close
A 1TB drive is 4 cents/MB, not 3. A 400% price differential isn't exactly close to price parity.

I remember when tape drives were announced as "dying" back in the mid 1980s. It'll probably happen someday, but I wouldn't hold my breath.




RE: No time soon
By davegraham (blog) on 6/28/2007 11:18:56 AM , Rating: 2
the argument can be made based on purchase cost (for example, i get those 1TB drives for less that 350.00 apiece) that 3 cents per MB is about right. Again, if I'm EMC, NetApp, 3par, Pillar, et al. I'm getting these drives for less than 50% of retail cost (if that). So, it does hold up at a certain lvl. As far as tape dying, i don't know if it will actually ever die out, but as you'll see in some of my future articles, there are "better" technologies coming that promise to be cheaper and easier to implement, etc. within the realm of DR.


RE: No time soon
By masher2 (blog) on 6/28/2007 4:08:34 PM , Rating: 1
> "Again, if I'm EMC, NetApp, 3par, Pillar, et al. I'm getting these drives for less than 50% of retail cost "

I don't think there's that much profit built into the retail cost of a hard drive any more. And still, if you're going to compare wholesale costs, why not look at the wholesale costs of LTO tapes?

Furthermore, there are other considerations besides cost, particularly for large enterprises. LTO tapes are rated a minimum of 30 years for archival storage. They're also a smaller form factor than an equivalent sized drive, and rated for far more insert/eject cycles. Also, tape has more redundancy built into the protocol. I wouldn't trust a single hard drive for archival storage...and if you move to Raid, that raises the prices much higher.

At some point they will probably reach price parity, I agree. But even then I don't think tape will die out entirely, due to the other factors I mentioned above.


RE: No time soon
By davegraham (blog) on 6/28/2007 4:25:20 PM , Rating: 2
unfortunately, i do know the real costs on those drives. :) the certification process alone results in a lot of "bad" drive culls. So, you'd have to factor man-hours into the price of the qualification for arrays, etc. Again, this is more of a high-level overview. I may or may not be able to rope some #'s into future articles but that's based on my access to that information. Suffice it to say, there actually IS a pretty hefty baseline margin to these drives.

cheers,

Dave


RE: No time soon
By davegraham (blog) on 6/28/2007 4:36:18 PM , Rating: 2
bah...posted too quickly.

quote:
Furthermore, there are other considerations besides cost, particularly for large enterprises. LTO tapes are rated a minimum of 30 years for archival storage.


sure. Optical has its own rating as well. Disk does too.

quote:
They're also a smaller form factor than an equivalent sized drive, and rated for far more insert/eject cycles.


they can be in smaller form factor but, not always. Also, there are no insert/eject cycles and no robotics involved (as in high end tape libaries). so, the maintenance costs along are mitigated by that. Definite advantage for disk there. All the disks are also in carriers that provide zero-touch for the actual physical drive interface. You should see the backplanes in these guys. Absolutely rigid. The only time you'd need to insert/remove disks is upon failure and typically, you'll have a maintenance contract that will engage professional services to do that. If a robotic arm or tape drive breaks, there's not a lot of protection offered.

quote:
Also, tape has more redundancy built into the protocol. I wouldn't trust a single hard drive for archival storage...and if you move to Raid, that raises the prices much higher.


it's actually quite the opposite. With pro-active hotsparing and various RAID schemas (RAID 3, for example) you completely minimize any double-down or multiple drive failures; data is constantly available and constantly able to be writing. The storage processors on these arrays can de-stage to vaults during environmental disruptings and literally pick up right where they left off (if the software /backup server allows it). Also, the single drive argument is not applicable. You'll never have a DL with a single drive. the smallest DL systems will have a minimum of 5 drives, again, with varying levels of protection. when all this is factored into a well-developed BURA (backup, recovery, archive) schema (as a subset of DR at large), it works and works well.


RE: No time soon
By davegraham (blog) on 6/28/2007 6:19:35 PM , Rating: 2
quote:
One thing about all tape backups. They all lie about capacity. No way no how do you get 800GB onto an LTO tape... Consider yourself lucky to get 600 GB on the tape...I recently replaced a DDS4 tape drive (I know pretty small 20/40 GB) because the backups where too large for 1 tape - the back up size? 22.6 GB (a lot of jpg, pst files dont compress well) To be safe plan on being able to use the native capacity of a tape, any compression you get is a bonus.


gotta love this quote from a customer. if this is the case, then....well, do the math. :)

dave


RE: No time soon
By masher2 (blog) on 6/28/2007 7:03:33 PM , Rating: 2
> "sure. Optical has its own rating as well. Disk does too."

The point is that the rating is higher for tape. For archival purposes, tape wins hands down.

> "Also, there are no insert/eject cycles..."

Not quite true. If you want an offsite copy you can vault, then you either need to be using external hard drives, which are not only more expensive, but their interface connect/disconnect rating is equivalent to an eject cycle.

> "If a robotic arm or tape drive breaks, there's not a lot of protection offered"

If an arm or drive breaks, you don't lose data though. That's only true if the tape itself does. A tape can fail, of course. But for archival purposes, a single tape is far more reliable than a single drive.

> "With pro-active hotsparing and various RAID schemas (RAID 3, for example) you completely minimize any double-down or multiple drive failures..."

But moving to RAID means a huge increase in the per-MB storage costs. In fact, there's no reason one couldn't "raid" tape drives, writing out two copies (RAID 1) or even a multi-tape parity scheme equivalent to any Raid level. The reason this isn't done? It's just not necessary. The failure rate is just too low. Plus, tapes lend themselves more naturally to versioned backups, which is your only insurance against application-level data corruption.

> "gotta love this quote from a customer. if this is the case, then....well, do the math. :)"
Since you're using the uncompressed capacity in both comparisons, this point is quite obviously moot.


RE: No time soon
By davegraham (blog) on 6/29/2007 9:35:26 AM , Rating: 2
another edit for clarification that didn't take:
Edit: There seems to be some confusion on a lot of different ends as to who this article is actually aimed at. When I'm thinking Disaster Recovery, I'm not thinking about Joe User with 5 computers in his basement. I'm thinking of commercial and enterprise install bases where multiple hundreds of gigabytes and even petabytes are being backed up. This includes databases, file servers, email storage groups, etc. For example, the idea that commercial/enterprise groups would literally unrack hard drive library solutions (as a sign of potential mechanical failure points) is not correct. Moving said items from data center to data center is definitely a pain point for mechanical failure, but no one, to the best of my knowledge, will off-site a DL library array in the same way that tapes are removed from libraries and stored off-site.


RE: No time soon
By masher2 (blog) on 6/29/2007 10:05:22 AM , Rating: 2
> "When I'm thinking Disaster Recovery, I'm not thinking about Joe User with 5 computers in his basement"

My point was that you can't compare tape to a single, internal drives. Even for "Joe User", you'd have to compare to external drives which can be removed and vaulted.

At the enterprise level, yes, no one would be doing this. But instead of single drives, you'd have library arrays that, for archival purposes at least, need to be substantially more reliable than Raid 5. So "price parity" is a good bit further away than one would be led to believe by looking at the per GB cost of a naked drive.

Lest my focusing on this point misleads, I do want to say I agree with the majority of the points you made, and found the article to be interesting and informative.


Math
By oTAL (blog) on 6/29/2007 5:33:26 AM , Rating: 2
Can anyone please explain the maths involved here? I noticed yesterday that it appeared to make no sense, but no one here posted any correction so I'm beginning to doubt myself. Is 1TB supposed to be 100GBs now? And 1GB = 100MBs??

Let's see, 1000GBs (approximately what a 1TB drive contains) costing $400 would make each GB cost about $0.40 . If you go further down to the price per MB it will be close to $0.0004 / MB. That's 2 orders of magnitude lower than posted on the article...

If I'm wrong please correct me cause I happen to have a state sponsored education, like others around here.




RE: Math
By davegraham (blog) on 6/29/2007 9:10:01 AM , Rating: 2
meh...just ignore that..I was rushed and I've yet to have time to go back and correct it.

cheers,

Dave


RE: Math
By davegraham (blog) on 6/29/2007 9:34:43 AM , Rating: 2
here was an edit i tried to toss in there specific to the pricing issue:
(Edit: as has been pointed out to me several times, either my math is a) a little challenged or b) I simply rounded down. At any given point I have 3 amounts sitting in front of me; distribution price points, volume price points, and [for lack of a better way of putting it] manufacturer OEM price points. My pricing above represents a fairly accurate synthesis of the price based on those 3 aforementioned areas)


RE: Math
By oTAL (blog) on 7/2/2007 11:08:30 AM , Rating: 2
I believe you may have misunderstood me.

I wasn't picking on your rounding error. That would give you an error of about 33% [400/300=1.(3)]. Not that important...

The error I'm pointing out is WAY larger than that! I'm talking about $0.0004 per MB (correct value) vs $0.03 per MB (stated in the article).
That's two orders of magnitude different (7400% off mark) and warrants a quick fix in my opinion.


Why 1TB drives?
By ninjit on 6/28/2007 2:36:31 AM , Rating: 2
For price parity, the use of 1TB drives is not the best comparison.

500GB drives are available for $100 now - a much much better deal - still 2x as expensive as tapes, but that's the point having a tiered solution as you describe.




RE: Why 1TB drives?
By davegraham (blog) on 6/28/2007 8:49:16 AM , Rating: 2
I actually could have used 750GB drives because they meet the compressed value of LTO-3 (for example). I'm planning to explore some of the hardware compression, software "tweaking" (as it were), and other aspects of VTL/DL in another, more granular article. A lot of units like the EMC DL series, have a 3:1 compression algorithmn that they use. In any case, I'll be developing this out more and more.

thanks for the feedback,

Ddave


Interesting...
By Trisped on 6/29/2007 1:38:25 AM , Rating: 2
A bit weird, but interesting.

Personally I prefer HDD backup, as they are much easier to work with. Still, if you can safely say that you will not need to pull the data off a tape backup, then the cost and space benefits of tape become very apparent.

As to the death of tape, I doubt it. Since tape and HDD are both based on the same tech I doubt one will ever substantially surpass the other. Not till Flash becomes a contender should the value of tape be brought to question.