Print 31 comment(s) - last by SiliconDoc.. on Aug 14 at 7:11 AM

Much ado about nothing?

My first reaction to the stories about an alleged bug in Windows 7 CHKDSK utility that might "derail" the launch of Windows 7 was that it is probably just another mountain being made of a molehill.  After doing a little reading and research, I am beginning to doubt if the bug is even worth calling a molehill.  That's because all the "research" that had supposedly replicated the bug didn't actually replicate the bug and all they did was verify CHKDSK's normal behavior.

Randall C. Kennedy who supposedly replicated the bug admitted that he wasn't actually able to get any of his test systems to crash yet he still called for a halt to the launch of Windows 7 in his InfoWorld blog.  Others like Jason Mick who accepted Kennedy's analysis as gospel concluded that Microsoft was trying to pass the buck and that this "underlying file system issue" would likely delay Windows 7.  But it is clear that Kennedy and others citing him haven't really thought it out nor are they qualified to determine what constitutes a bug.

To get to the bottom of this, we first need to understand what CHKDSK is and what role it plays.  CHKDSK is a Windows disk checking utility that repairs hard drive errors.  Even if the tool has some incompatibilities with a small percentage of hardware, that should hardly derail the launch of the much awaited Windows 7 operating system.  CHKDSK using the /r switch looks for bad hard drive sectors and tries to salvage any good data that it can.  Most people don't even run CHKDSK much less with the /r switch because they simply don't get hard drive errors.  Even when they do have hard drive errors, they probably don't even notice unless it is something severe.  But even if there is a bug in the way the CHKDSK utility, it is not a flaw in the underlying file system.

But as the president of the Microsoft Windows Division Steven Sinosky pointed out, the mere fact that people are replicating the heavy memory consumption behavior of CHKDSK when using the /r switch doesn't prove a thing.  That's because CHKDSK is supposed to use maximum resources to repair a corrupted hard drive as soon as possible and that users shouldn't be doing anything else on the system while they wait for this to complete.  This makes a lot of sense because you certainly wouldn't expect to drive your car while someone is changing out the oil.  The priority here is to complete the repairs as soon as possible and this is precisely what CHKDSK does so it consumes all but 50 megabytes of available memory to finish repairs as soon as possible.  Then when it completes, it releases the memory so that the user gets the system resources back.  Since there was no crash replicated, it was silly for Randall Kennedy and everyone else to call this a bug much less a critical bug that would halt the launch of Windows 7.

Now for the very few people who actually get their Windows 7 machines to crash, there is a very likely possibility that the underlying firmware, drivers, or hardware isn't completely stable.  I know this first hand because one of my computers and a friend's computer that used to run fine on Windows XP refused to run on Windows Vista due to some memory problems.  Because the bad memory was near the end of the addressable memory space and Windows XP never used that much memory, the problem never materialized in XP until we used an OS that consumed more resources.  I had to download MemTest86+ and burn a bootable CD using ISO Recorder 3.1 which I booted to inspect my memory.  In both cases, my friend and I had to get Corsair and Kingston to send us new memory at no cost.  Anyone who owns a computer should be running this test anyways just to validate their own hardware.  MemTest86+ also managed to fail when my friend had a faulty CPU so it indirectly detects some CPU problems as well.

Another lesson I've learned in the past is that it is always a good idea to update motherboard firmware when you want to install a new Operating System.  It is simply a fact of life that older motherboard firmwares may not handle newer CPUs or newer Operating Systems very well.  Even if you're not going to install a new Operating System, it's a good idea to inspect your hardware and update the firmware to make sure your hardware is completely stable so that there is less possibility of silently corrupting data.

So can we conclude that there is no bug in CHKDSK?  We can't say for sure but we should definitely not conclude that there is a bug.  Microsoft has been testing 40 machines over night since yesterday and they haven't replicated the problem yet so it's starting to look like a hardware, firmware, or driver issue in some rare configurations.  We can conclude for certain is that this issue if there even is an issue will not derail Windows 7 launch.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: Inherent mistrust of Microsoft
By erple2 on 8/10/2009 1:01:14 PM , Rating: 2
I disagree. It is clear to me that you don't work in the software business, or if you do, you don't develop any kind of complicated software that any appreciable number of users use.

Any time we get a trouble ticket in from a customer, the first thing we do is to try and replicate the situation. If we are not able to replicate the situation, we don't (yet) confirm the ticket as a "bug". We then go into the software (source code) and start looking for the types of behaviors that are seen out in the field. As you can guess, this can be very time consuming and difficult. If we do finally confirm the behavior that the customer is seeing, we open a bug and work it normally. If we are never able to replicate the problem, then we're left in a weird state - we can't replicate the bug, so we can't effectively debug it. That's not to say that we don't "try" a couple of things to see if we can avoid that situation in the future, however.

I can imagine that with hardware, that's even more difficult to deal with, if only because there's a much larger lag time. Plus, you can't just "make another build" quickly or cheaply.

Until we're able to actually see the problem, we can't mark the problem as a "bug" and start tracking it internally through the software life cycle.

I suspect that a lot of the "bad blood" that people have with any software company is that their problems are not reproducible. And having a bunch of people on blog and forum posts saying that there's a problem doesn't mean that there's any real problem. That's the inherent problem with blogs and forums: people post because they're angry or upset at something, not because "there's nothing to see here, things are working fine". The intertubes needs a vetting process.

"Paying an extra $500 for a computer in this environment -- same piece of hardware -- paying $500 more to get a logo on it? I think that's a more challenging proposition for the average person than it used to be." -- Steve Ballmer

Related Articles

Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki