Study says failure rates 15 times that of what manufacturers indicate

A study released this week by Carnegie Mellon University revealed that hard drive manufacturers may be exaggerating their mean-time before failure (MTBF) ratings on hard drives. In fact, researchers at Carnegie indicated that on the average, failure rates were as high as 15 times the rated MTBFs.

Rounding-up roughly 100,000 hard drives across a variety of manufacturers, researchers at Carnegie tested the drives in various operating conditions as well as real world scenarios. Some drives were at Internet services providers, others at large data centers and some were at research labs. According to test results, the majority of the drives did not appear to be affected by their operating environment. In fact, researchers indicated that drive operating temperatures had little to no effect on failure rates -- a cool hard drive survived no longer than one running hot.

The types of drives used in the study ranged from Serial ATA drives, SCSI and even high-end fiber-channel (FC) drives. Typically, customers will be paying a much larger premium for SCSI and FC drives, which also happen to usually carry longer warranty periods and higher MTBF ratings.

Carnegie researchers found that these high-end drives did not outlast their mainstream counterparts:
In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk-independent factors, such as operating conditions, usage and environmental factors affect replacement rates more than component specific factors.
According to the study, the number one cause of drive failures was simply age. The longer the drive has been in operation, the more likely it will fail. According to the study, drives tended to start showing signs of failure after roughly five to seven years of service, after which there was a significant increase in average failure rates (AFR). The failure rates of drives that failed in their first year of service or shorter was just as high as those after the seven year mark.

According to Carnegie researchers, manufacturer MTBF ratings are highly overrated. Take for example the Seagate Cheetah X15 series, which has a MTBF rating of 1.5 million hours. This equates to roughly over 171 years of constant service before problems. Carnegie's researchers said however that customers should expect a more reasonable 9 to 11 years. Interestingly, real world tests in the study showed a consistent average failure of about six years.

The average replacement rate of drives ranged from 2-percent to a whopping 13-percent annually, indicating that there is a need for manufacturers to reevaluate the way a MTBF rating is generated. Worst of all, these rates were for drives with MTBF ratings between 1 million and 1.5 million hours.

Garth Gibson, associate professor of computer science at Carnegie indicated that the study was proof that MTBFs are not a reliable way of measuring drive quality. "We had no evidence that SATA drives are less reliable than the SCSI or Fiber Channel drives," said Gibson.

Carnegie researchers concluded that backup measures are a necessity with critically important data, no matter what kind of hard drive is being used. It is interesting to note that even Google's own data centers use mainly SATA and PATA drives. At the current rate, it is only a matter of time before SATA will perform equal or better than SCSI and FC drives, offering the same reliability, and for much less money.

RE: MTBF numbers are a lie
By retrospooty on 3/9/2007 10:30:29 PM , Rating: 3
You dont undertand what MTBF is... Zippercow explained it best above.

[short time period]*[number of pieces tested]/[number of pieces tested which failed within that time period]=MTBF

The MTBF rating of our DVR is 449,616.52 hours (~51 years).This means that if 51 DVRs were to be run for 1 year, 1 failure out of those 51 could be expected.

This does not mean the DVR is expected to last 51 years.

RE: MTBF numbers are a lie
By nothingtoseehere on 3/10/2007 4:55:13 PM , Rating: 2
The 'between' from the 'B' in MTBF implies that the number represents a time period between 'F' failures... Which two failures are those in your equation? There is no 'failure A time' minus 'failure B time' in your equation, because there is no failure A and B to compare the time between, so there is no number to take the mean of, so your equation cannot lead to the MTBF.

Maybe the manufacturers determine their MTBF's that way, sure, but then they shouldn't call it MTBF because that is misleading.

RE: MTBF numbers are a lie
By TomZ on 3/10/07, Rating: 0
RE: MTBF numbers are a lie
By Bladen on 3/11/2007 6:56:33 AM , Rating: 2
Actually Mega, Kilo, etc are metric, and metric works to the power of 10. So they are right about that.

Don't get me wrong though, MTBF is misleading at the best, unless theu want to change it too "MTBF of 1000 drives for 1000 test hours" - or what ever sample size and time they use.

RE: MTBF numbers are a lie
By TomZ on 3/12/2007 8:44:08 AM , Rating: 2
Actually, that's wrong. The prefixes are Greek prefixes, and they were not invented for the metric system.

In the computer industry, powers-of-two prefix definitions been commonly used for about a half-century. You just need to understand that "megabyte" has two meanings - sometimes 1000^2 and sometimes 1024^2, depending on the context. For example, when I purchase a HDD, a megabyte = 1000^2; however, when I purchase DRAM a megabyte is 1024^2.

Engineers and computer scientists usually use powers-of-two definitions. It is mainly the marketing literature that is using powers-of-ten to inflate the apparent capacity of HDDs. Even with HDDs, the fundamental sector size is a power-of-two measure (e.g., 512 bytes), so the inherent design of the drive is powers-of-two, however it is marketed as powers-of-ten. This is a relatively recent development - in "olden days" (e.g., 10 years ago), most (all?) HDDs used powers-of-two measurements when they stated their capacity.

RE: MTBF numbers are a lie
By BikeDude on 3/12/2007 9:59:26 AM , Rating: 1
No, I remember drives from twenty years ago that were a tad optimistic in their size estimation. The only thing that changed ten years ago was a dramatic increase in total drive space (making the discrepancy even more apparent).

The communication industry also use the power of ten. Basically they count the number of bits that are transferred and do not pay heed to whether a byte (or word) is 5, 6, 7, 8 or 9 bits long.

And of course:
k - kilo (1000)
K - Kilo based on power of two (1024)
m - milli (meaningless as far as 'we' are concerned)
M - Mega (1000k or 1024K depending on context)
b - bit (!)
B - Byte

I.e. if you see someone using "mb" (millibit) as an unit, please hit them hard on the head.


RE: MTBF numbers are a lie
By TomZ on 3/12/2007 3:18:22 PM , Rating: 2
1. HDD size discrepencies in years past were due to the misunderstanding of the difference between "raw" capacity and "formatted" capacity. That problem still exists today, but the word is out more on that one.

2. The communication industry uses powers-of-ten measures for <prefix>bytes/s because they implement communication standards that use powers-of-ten crystals and signaling frequencies. These crystals are obviously more prevalent than powers-of-two crystals.

3. I don't think that 'k' vs 'K' meaning 1000 vs 1024 is really widely accepted or really even a good idea. It is too subtle of a distinction, since humans are pretty used to ignoring case.

RE: MTBF numbers are a lie
By Hoser McMoose on 3/11/2007 12:53:08 PM , Rating: 2
Obviously it is a bit self-serving of them, but in this case at least they are 100% accurate, more so then the OS definition which of Kilobyte being 1024 bytes, which is simply wrong.

Kilo, mega, giga, etc. are SI prefixes which are well defined. By the very definition of the prefix, "megabyte" = 1,000,000 bytes, and this definition predated computers by a LONG time. It only got subverted to mean 1,048,576 because in computers things usually come in powers of two and 2^20 is "close enough" to a million that we figured it was ok for us lazy folk.

RE: MTBF numbers are a lie
By TomZ on 3/11/2007 10:16:10 PM , Rating: 2
Kilobyte has never meant 1000 bytes - never.

RE: MTBF numbers are a lie
By Chernobyl68 on 3/15/2007 6:01:43 PM , Rating: 2
I'd rather they test 100 drives until all 100 failed - than tell me what the average failure time was.

