Print 73 comment(s) - last by Chernobyl68.. on Mar 15 at 6:01 PM

Study says failure rates 15 times that of what manufacturers indicate

A study released this week by Carnegie Mellon University revealed that hard drive manufacturers may be exaggerating their mean-time before failure (MTBF) ratings on hard drives. In fact, researchers at Carnegie indicated that on the average, failure rates were as high as 15 times the rated MTBFs.

Rounding-up roughly 100,000 hard drives across a variety of manufacturers, researchers at Carnegie tested the drives in various operating conditions as well as real world scenarios. Some drives were at Internet services providers, others at large data centers and some were at research labs. According to test results, the majority of the drives did not appear to be affected by their operating environment. In fact, researchers indicated that drive operating temperatures had little to no effect on failure rates -- a cool hard drive survived no longer than one running hot.

The types of drives used in the study ranged from Serial ATA drives, SCSI and even high-end fiber-channel (FC) drives. Typically, customers will be paying a much larger premium for SCSI and FC drives, which also happen to usually carry longer warranty periods and higher MTBF ratings.

Carnegie researchers found that these high-end drives did not outlast their mainstream counterparts:
In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk-independent factors, such as operating conditions, usage and environmental factors affect replacement rates more than component specific factors.
According to the study, the number one cause of drive failures was simply age. The longer the drive has been in operation, the more likely it will fail. According to the study, drives tended to start showing signs of failure after roughly five to seven years of service, after which there was a significant increase in average failure rates (AFR). The failure rates of drives that failed in their first year of service or shorter was just as high as those after the seven year mark.

According to Carnegie researchers, manufacturer MTBF ratings are highly overrated. Take for example the Seagate Cheetah X15 series, which has a MTBF rating of 1.5 million hours. This equates to roughly over 171 years of constant service before problems. Carnegie's researchers said however that customers should expect a more reasonable 9 to 11 years. Interestingly, real world tests in the study showed a consistent average failure of about six years.

The average replacement rate of drives ranged from 2-percent to a whopping 13-percent annually, indicating that there is a need for manufacturers to reevaluate the way a MTBF rating is generated. Worst of all, these rates were for drives with MTBF ratings between 1 million and 1.5 million hours.

Garth Gibson, associate professor of computer science at Carnegie indicated that the study was proof that MTBFs are not a reliable way of measuring drive quality. "We had no evidence that SATA drives are less reliable than the SCSI or Fiber Channel drives," said Gibson.

Carnegie researchers concluded that backup measures are a necessity with critically important data, no matter what kind of hard drive is being used. It is interesting to note that even Google's own data centers use mainly SATA and PATA drives. At the current rate, it is only a matter of time before SATA will perform equal or better than SCSI and FC drives, offering the same reliability, and for much less money.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

By dallastx on 3/10/2007 12:45:08 PM , Rating: 2
If the calculation you guys mention for hard drive MTBF is correct (and it sounds like it is, given the high numbers from manufacturers), then this is pretty misleading. What consumers care about is "Hey, if I purchase this drive from you today, on average, how long will it be until it fails". However it seems the MTBF is far from this.

I'm curious though, I recently bought a water pump to cool my system, and the manufacturer listed 50,000 hours for MTBF. I took this to mean an average lifetime of ~5 years, which seems reasonable. So I wonder if their definition of MTBF is different. If so, they should really standardize the meaning of the term.

How the hard drive manufacturers calculate MTBF sort of reminds me of how some companies calculate their reliability. We offer (as does everyone in the industry) 99.999% reliability. However, it's an aggregated total (over all system components), and is pretty much meaningless if you care about how often you should expect a failure that incurs some form of data loss / data integrity issue. It's like saying "Hrmm... the mouse never fails so lets include its uptime in our failure rate calculation for the entire system.. That should minimize the effect of the processor board rebooting every week."

RE: misleading
By bldckstark on 3/11/2007 2:36:14 PM , Rating: 2
I read a while back that the company that invented the blue led was psyched becaue their 5 years of testing was up, and they were finally going to be able to sell it. An led's standard accepted life is 5 years minimum, so they have to test them for 5 years in order to market them successfully. There was no statistical inferation of the outcome. They lit a bunch of led's, then they left them on for 5 years. As far as I know some of them are still on. This was a big deal, because first comes the led, then the laser. That is where the BR-DVD came from.

My point is that not all companies/markets use this form of highly misleading statistical analysis, although many do. You might note that lightbulbs last about as long as they say they will on the box.

Samuel Clemson (Mark Twain), said in the 1800's that there are three kinds of lies -
1. Lies
2. Damned Lies
3. Statistics

RE: misleading
By Oregonian2 on 3/12/2007 1:33:54 PM , Rating: 2
Could also just mean that they didn't understand the failure mechanisms sufficiently so that they could use mathematical methods to extrapolate failure rates based upon shorter term testing.

So they didn't sell ANY until after 5 years?

"We can't expect users to use common sense. That would eliminate the need for all sorts of legislation, committees, oversight and lawyers." -- Christopher Jennings

Most Popular ArticlesSmartphone Screen Protectors – What To Look For
September 21, 2016, 9:33 AM
UN Meeting to Tackle Antimicrobial Resistance
September 21, 2016, 9:52 AM
Walmart may get "Robot Shopping Carts?"
September 17, 2016, 6:01 AM
5 Cases for iPhone 7 and 7 iPhone Plus
September 18, 2016, 10:08 AM
Update: Problem-Free Galaxy Note7s CPSC Approved
September 22, 2016, 5:30 AM

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki