backtop


Print 73 comment(s) - last by Chernobyl68.. on Mar 15 at 6:01 PM

Study says failure rates 15 times that of what manufacturers indicate

A study released this week by Carnegie Mellon University revealed that hard drive manufacturers may be exaggerating their mean-time before failure (MTBF) ratings on hard drives. In fact, researchers at Carnegie indicated that on the average, failure rates were as high as 15 times the rated MTBFs.

Rounding-up roughly 100,000 hard drives across a variety of manufacturers, researchers at Carnegie tested the drives in various operating conditions as well as real world scenarios. Some drives were at Internet services providers, others at large data centers and some were at research labs. According to test results, the majority of the drives did not appear to be affected by their operating environment. In fact, researchers indicated that drive operating temperatures had little to no effect on failure rates -- a cool hard drive survived no longer than one running hot.

The types of drives used in the study ranged from Serial ATA drives, SCSI and even high-end fiber-channel (FC) drives. Typically, customers will be paying a much larger premium for SCSI and FC drives, which also happen to usually carry longer warranty periods and higher MTBF ratings.

Carnegie researchers found that these high-end drives did not outlast their mainstream counterparts:
In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks. This may indicate that disk-independent factors, such as operating conditions, usage and environmental factors affect replacement rates more than component specific factors.
According to the study, the number one cause of drive failures was simply age. The longer the drive has been in operation, the more likely it will fail. According to the study, drives tended to start showing signs of failure after roughly five to seven years of service, after which there was a significant increase in average failure rates (AFR). The failure rates of drives that failed in their first year of service or shorter was just as high as those after the seven year mark.

According to Carnegie researchers, manufacturer MTBF ratings are highly overrated. Take for example the Seagate Cheetah X15 series, which has a MTBF rating of 1.5 million hours. This equates to roughly over 171 years of constant service before problems. Carnegie's researchers said however that customers should expect a more reasonable 9 to 11 years. Interestingly, real world tests in the study showed a consistent average failure of about six years.

The average replacement rate of drives ranged from 2-percent to a whopping 13-percent annually, indicating that there is a need for manufacturers to reevaluate the way a MTBF rating is generated. Worst of all, these rates were for drives with MTBF ratings between 1 million and 1.5 million hours.

Garth Gibson, associate professor of computer science at Carnegie indicated that the study was proof that MTBFs are not a reliable way of measuring drive quality. "We had no evidence that SATA drives are less reliable than the SCSI or Fiber Channel drives," said Gibson.

Carnegie researchers concluded that backup measures are a necessity with critically important data, no matter what kind of hard drive is being used. It is interesting to note that even Google's own data centers use mainly SATA and PATA drives. At the current rate, it is only a matter of time before SATA will perform equal or better than SCSI and FC drives, offering the same reliability, and for much less money.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

MTBF numbers are a lie
By Beenthere on 3/9/2007 9:01:35 PM , Rating: -1
MTBF numbers for hard drives have been a lie since the beginning of time. Anyone who builds or maintains PCs can tell you that HD's fail all the time and the MTBF numbers are pure fantasy.

When HD mfgs. realized the only way to sell new drives was to increase capacity, the quality dropped at almost the same rate as the capacity increased. Then the HD companies got into price wars that drove prices down to the point where no one makes much money so the quality dropped to the gutter.

The only HDs that seem to have any data or mechanical reliability these days are traditional SCSI drives. Unfortunately they are being ignored in order to sell serial SCSI drives which cost less to produce but are also more unreliable like S-ATA drives have proven to be.

As far as I am concerned MTBF numbers are outright lies.




RE: MTBF numbers are a lie
By PandaBear on 3/9/2007 9:15:53 PM , Rating: 2
Agree, MTBF is usually useful only for things that fail on random in infant mortality. It is a prediction of how many DOA drives you get if you order a large quantity. Once you power it on, it is not a prediction on how many of them die within 1, 3, 5, 7 years.

Every drive is designed differently and eventually fail due to different reason. So there is no universal formula to quantify it. Most large OEM (i.e. Dell or HP) run their own qualification test on each new design/model before they take a huge order of millions of drives, so they know and always get the best prime drives.

The rest that failed, like 250GB rather than 300GB, or 120GB rather than 160GB, goes into retail for the average users. Big OEM won't accept them with 1/4 head clipped or the outer 1/4 ring disabled.

The ones that do very bad? goes to Fry's as white box. I once saw an IBM Deskstar with hand soldered resistor on the PCB, clearly a reject.


RE: MTBF numbers are a lie
By retrospooty on 3/9/2007 10:30:29 PM , Rating: 3
You dont undertand what MTBF is... Zippercow explained it best above.

[short time period]*[number of pieces tested]/[number of pieces tested which failed within that time period]=MTBF

The MTBF rating of our DVR is 449,616.52 hours (~51 years).This means that if 51 DVRs were to be run for 1 year, 1 failure out of those 51 could be expected.

This does not mean the DVR is expected to last 51 years.


RE: MTBF numbers are a lie
By nothingtoseehere on 3/10/2007 4:55:13 PM , Rating: 2
The 'between' from the 'B' in MTBF implies that the number represents a time period between 'F' failures... Which two failures are those in your equation? There is no 'failure A time' minus 'failure B time' in your equation, because there is no failure A and B to compare the time between, so there is no number to take the mean of, so your equation cannot lead to the MTBF.

Maybe the manufacturers determine their MTBF's that way, sure, but then they shouldn't call it MTBF because that is misleading.


RE: MTBF numbers are a lie
By TomZ on 3/10/07, Rating: 0
RE: MTBF numbers are a lie
By Bladen on 3/11/2007 6:56:33 AM , Rating: 2
Actually Mega, Kilo, etc are metric, and metric works to the power of 10. So they are right about that.

Don't get me wrong though, MTBF is misleading at the best, unless theu want to change it too "MTBF of 1000 drives for 1000 test hours" - or what ever sample size and time they use.


RE: MTBF numbers are a lie
By TomZ on 3/12/2007 8:44:08 AM , Rating: 2
Actually, that's wrong. The prefixes are Greek prefixes, and they were not invented for the metric system.

In the computer industry, powers-of-two prefix definitions been commonly used for about a half-century. You just need to understand that "megabyte" has two meanings - sometimes 1000^2 and sometimes 1024^2, depending on the context. For example, when I purchase a HDD, a megabyte = 1000^2; however, when I purchase DRAM a megabyte is 1024^2.

Engineers and computer scientists usually use powers-of-two definitions. It is mainly the marketing literature that is using powers-of-ten to inflate the apparent capacity of HDDs. Even with HDDs, the fundamental sector size is a power-of-two measure (e.g., 512 bytes), so the inherent design of the drive is powers-of-two, however it is marketed as powers-of-ten. This is a relatively recent development - in "olden days" (e.g., 10 years ago), most (all?) HDDs used powers-of-two measurements when they stated their capacity.


RE: MTBF numbers are a lie
By BikeDude on 3/12/2007 9:59:26 AM , Rating: 1
No, I remember drives from twenty years ago that were a tad optimistic in their size estimation. The only thing that changed ten years ago was a dramatic increase in total drive space (making the discrepancy even more apparent).

The communication industry also use the power of ten. Basically they count the number of bits that are transferred and do not pay heed to whether a byte (or word) is 5, 6, 7, 8 or 9 bits long.

And of course:
k - kilo (1000)
K - Kilo based on power of two (1024)
m - milli (meaningless as far as 'we' are concerned)
M - Mega (1000k or 1024K depending on context)
b - bit (!)
B - Byte

I.e. if you see someone using "mb" (millibit) as an unit, please hit them hard on the head.

--
Rune


RE: MTBF numbers are a lie
By TomZ on 3/12/2007 3:18:22 PM , Rating: 2
1. HDD size discrepencies in years past were due to the misunderstanding of the difference between "raw" capacity and "formatted" capacity. That problem still exists today, but the word is out more on that one.

2. The communication industry uses powers-of-ten measures for <prefix>bytes/s because they implement communication standards that use powers-of-ten crystals and signaling frequencies. These crystals are obviously more prevalent than powers-of-two crystals.

3. I don't think that 'k' vs 'K' meaning 1000 vs 1024 is really widely accepted or really even a good idea. It is too subtle of a distinction, since humans are pretty used to ignoring case.


RE: MTBF numbers are a lie
By Hoser McMoose on 3/11/2007 12:53:08 PM , Rating: 2
Obviously it is a bit self-serving of them, but in this case at least they are 100% accurate, more so then the OS definition which of Kilobyte being 1024 bytes, which is simply wrong.

Kilo, mega, giga, etc. are SI prefixes which are well defined. By the very definition of the prefix, "megabyte" = 1,000,000 bytes, and this definition predated computers by a LONG time. It only got subverted to mean 1,048,576 because in computers things usually come in powers of two and 2^20 is "close enough" to a million that we figured it was ok for us lazy folk.


RE: MTBF numbers are a lie
By TomZ on 3/11/2007 10:16:10 PM , Rating: 2
Kilobyte has never meant 1000 bytes - never.


RE: MTBF numbers are a lie
By Chernobyl68 on 3/15/2007 6:01:43 PM , Rating: 2
I'd rather they test 100 drives until all 100 failed - than tell me what the average failure time was.


RE: MTBF numbers are a lie
By mjcutri on 3/10/2007 8:29:52 AM , Rating: 3
Did you even READ the article, or just the headline? They tested all kinds of drives in all kinds of different environments and found that all of them performed about the same, regardless of their type.
"In our data sets, the replacement rates of SATA disks are not worse than the replacement rates of SCSI or FC disks."


RE: MTBF numbers are a lie
By goku on 3/12/2007 12:20:35 AM , Rating: 2
yeah I noticed that, but what about PATA drives? About 5 years ago, weren't they more likely to fail?


RE: MTBF numbers are a lie
By gsellis on 3/12/2007 7:59:10 AM , Rating: 2
I would assume that any PATA drive would perform as a similar SATA drive. The connector type means less than the moving parts.


RE: MTBF numbers are a lie
By PandaBear on 3/12/2007 2:47:30 PM , Rating: 2
SATA and PATA are just interfaces, it is the design and components of the HD that makes them good or bad. Example:

Raptor are much more reliable than Maxtor's SATA.


RE: MTBF numbers are a lie
By TomZ on 3/12/2007 4:43:10 PM , Rating: 2
Probably all you can say is the PATA connection/connector is more likely to fail than a SATA connection/connector. As the others have pointed out, the interface shouldn't have much bearing on the reliability of the drive itself.


"It looks like the iPhone 4 might be their Vista, and I'm okay with that." -- Microsoft COO Kevin Turner











botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki