backtop


Print 64 comment(s) - last by clnee55.. on Dec 24 at 2:33 AM

Much of AMD's bad luck over the last three months revolves around a nasty bug it just can't shake

Erratum, to those in the hardware or software industry, is a nice way of saying "we missed a test case" during development and design. 

Yesterday, The Tech Report confirmed AMD's iteration of Intel's F00F bug.  The bug, which has been documented since at least early November, can cause a deadlock during recursive or nested cache writes. 

How does the TLB erratum occur?  All AMD quad-core processors utilize a shared L3 cache.  In instances where the software uses nested memory pages, this processor will experience a race condition. 

AMD's desktop product marketing manager Michael Saucier describes a race condition as a series of events "where the other guy wins who isn't supposed to win." 

In the software world, a typical memory race condition occurs when the memory arbiter is instructed to overwrite an older block of memory, but write the old block of memory to somewhere else in cache.  In the instance where two arbiters follow this same rule set, its easy to see how a race condition can occur: both arbiters attempt to overwrite the same blocks of information, resulting in a deadlock.

From what AMD engineers would tell DailyTech, this example is very similar to what occurs with nested memory pages in virtualized machines on these K10 processors. 

AMD has since released a new BIOS patch for all K10 motherboards, including the often cited but rarely seen MSI K9A2 Platinum.  This patch, confirmed by DailyTech, will result in at least a 10% reduction in general computing speed. 

AMD partners tell DailyTech that all bulk Barcelona shipments have been halted pending application screening based on the customer.  Cray, for example, was allowed its latest allocation for machines that will not use these nested virtualization techniques.  Other AMD corporate customers were told to use Revision F3 (K8) processors in the meantime. 

The TLB erratum will be fixed in the B3 stepping of all AMD quad-core processors, including Phenom and Barcelona.  However, AMD considers the B3 stepping a "March" item on its 2008 roadmap.  Processors shipped between then and now will still carry the TLB bug, though with the BIOS workaround these machines will not experience a lockup. 

The delayed Phenom 9700 is affected by the TLB bug, though AMD insiders tell DailyTech the upcoming 2.6 GHz Phenom 9900 is not affected.  This indicates Phenom 9900 will carry the B3-stepping designation.

AMD's latest roadmap hints that its tri-core processors are merely quad-core processors with one core disabled. The company also indicated that it will introduce some of these tri-core processors with the L3 cache disabled.  Removing the shared-L3 cache from the chip design eliminates the TLB bug.

In a likely-related event, AMD's newest corporate roadmap scheduled three Phenom processors for the first half of 2008; one of which is the Phenom 9700.  The company will launch eleven new 65nm K8 processors in the same time period.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Not good
By FITCamaro on 12/5/2007 12:51:49 PM , Rating: 2
Not good news for AMD. However, for consumers, it 99.9% won't affect you since most consumers do not use any virtualization software. But yes for the big volume customers building servers that may, it definitely is an issue.




RE: Not good
By TomZ on 12/5/2007 1:12:15 PM , Rating: 2
From what I understand, it would be bad for consumers, since probably most motherboards will have the microcode workaround applied in the BIOS, which has the nasty side effect of a 10-20% performance hit.

Also, from what I understand, the bug doesn't just occur when running virtualization software. It just happens to be that running virtualization software is the most likely trigger.


RE: Not good
By KristopherKubicki (blog) on 12/5/2007 1:15:29 PM , Rating: 2
From my understanding there has to be a nested memory page and high utilization. There are lot of occasions when this can happen, but I think the only way you could really duplicate persistant nested memory pages is with virtualization.


RE: Not good
By 16nm on 12/5/2007 2:41:53 PM , Rating: 3
The silver lining in all of this is, assuming that the AMD board and management are not a bunch of monkeys and buffoons, that they will learn from this mistake and make sure it never happens again. I am reminded of the Pentium 586 math bug that caught a lot of headlines in the nineties. Intel learned a lot from that mistake. Mistakes reveal our weaknesses and how to fix them.

Intel made a mistake in Netburst, now look how they're improved.

However, you must see the irony in a quad core server CPU failing at virtualization. One of the most compeling reasons for business to buy quadcore machines is for server consolidation through virtualization. LOL. That sucks big time!


RE: Not good
By KristopherKubicki (blog) on 12/5/2007 3:50:50 PM , Rating: 2
That does seem like a pretty big oversight..


RE: Not good
By Master Kenobi (blog) on 12/5/2007 6:20:15 PM , Rating: 2
Yea, on the desktop or workstation end, quad core is nice but not critical. Quad core servers on the other hand offer virtualization on multiple servers without the heavy licensing and hardware costs normally associated with it.


RE: Not good
By erikejw on 12/6/2007 4:54:33 AM , Rating: 2
Well, was the released Phenoms with the BIOS that made them work 100% albeit somewhat slower or were they with a faulty BIOS that is faster.

If it was fully working they will do better against the Core2 processors but probably a little slower anyway.


RE: Not good
By qwertyz on 12/6/2007 10:44:25 AM , Rating: 1
the BIOS fix will most probably disable the 2 MB L3 cache


RE: Not good
By clnee55 on 12/24/2007 2:33:08 AM , Rating: 2
quote:
That does seem like a pretty big oversigh


The validation engineer who missed the floating point bug at Intel was probably got fired. I wonder if he works in AMD now.


RE: Not good
By kalak on 12/7/2007 1:30:02 PM , Rating: 2
quote:
Mistakes reveal our weaknesses and how to fix them.


I agree, but AMD/ATI public image is becoming very bad... that's a shame.... Please react AMD !


RE: Not good
By Andypro on 12/5/2007 2:54:46 PM , Rating: 2
Kris: I noticed that you use the word "utilize" excessively in your articles. That's a waste of bytes. Utilize doesn't have any added meaning beyond "use." Try to utilize the better word from now on :p


RE: Not good
By TomZ on 12/5/07, Rating: 0
RE: Not good
By KristopherKubicki (blog) on 12/5/2007 3:42:25 PM , Rating: 2
Nah its a bad habit. When you write about the same thing over and over again it gets easy to use the same words repeatedly. "slate" is another word I'm trying not to use.


RE: Not good
By Egger on 12/5/2007 8:34:21 PM , Rating: 2
Not to be a grammar police or anything, but in all honesty there is a slight difference between 'use' and 'utilize', as 'utilize' infers the meaning of 'to make use of', as in 'to use despite being of other uses as well [as the referred use]'. So in most cases, the word 'utilize' that Kristopher uses is correct.

Sorry to be a troll and post once every ten years despite lurking constantly, just trying to provide occasional positive input really.


RE: Not good
By qwertyz on 12/5/07, Rating: -1
RE: Not good
By Oregonian2 on 12/5/2007 2:18:27 PM , Rating: 2
ROTFL


RE: Not good
By Oregonian2 on 12/5/2007 3:10:38 PM , Rating: 3
Somebody appears to not have liked my laughing at the obvious tongue-in-cheek joke that someone posted (*ALL* processors ever produced in the history of semiconductors have had bugs in every version produced -- even the comparatively trivial 8080/Z80 processors had bugs in nice eratta sheets (I design embedded processor systems and have since the 8008 processor (as the HW circuit designer))).


RE: Not good
By Myrandex on 12/5/2007 2:41:37 PM , Rating: 2
[sarcasm] Hell Yeah! Especially since no other CPU manufacturer ever has bugs in their designs! Also Software should be held to these standards, so no more Software can be sold again since none is bug free! [/sarcasm]


RE: Not good
By Clauzii on 12/5/2007 5:46:42 PM , Rating: 2
Talking about flicking the big red switch :o


RE: Not good
By michal1980 on 12/5/2007 1:17:16 PM , Rating: 2
but wont the 'fix' slow down everyones pc? even if they are not using vt?


RE: Not good
By KristopherKubicki (blog) on 12/5/2007 1:21:51 PM , Rating: 2
Yes since there are corner cases with nested memory pages outside of virtualization.


RE: Not good
By masher2 (blog) on 12/5/2007 2:54:32 PM , Rating: 2
I'm not seeing the benefits of Barcelona's nested paging outside of virtualization. Do you happen to know what those corner cases might be?


RE: Not good
By KristopherKubicki (blog) on 12/5/2007 4:16:38 PM , Rating: 2
Well, it wouldn't be the processor that enables nested memory pages -- that's the software. There's nothing that says you can't have memory sitting on top of other memory -- although when we're talking system memory instead of L3 cache that's the potential for a buffer overflow.

I don't pretend to understand exactly how the memory arbiters for K10 work, though knowing the problem affects nested memory and knowing you have multiple cores accessing/writing that L3 cache, it seems like that is a hot spot for race conditions.


RE: Not good
By masher2 (blog) on 12/5/2007 4:47:41 PM , Rating: 2
> "Well, it wouldn't be the processor that enables nested memory pages -- that's the software"

By "nested memory pages", I assumed you meant Barcelona's nested paging feature, which I think AMD is now actually calling "Rapid Virtualization Indexing". Am I wrong on this?


RE: Not good
By KristopherKubicki (blog) on 12/5/2007 4:56:23 PM , Rating: 2
I was not directly referring to AMD's technology. My colleagues at Tech Report just published the Linux patch notes that detail what went wrong. It's not too different from the general example I used:

http://www.techreport.com/discussions.x/13742


RE: Not good
By wetwareinterface on 12/6/2007 4:35:02 AM , Rating: 2
sql database
certain multithreaded apps that rely on cpu to cpu cache lookups and use nested paging to achieve this without specifically addressing either l1 l2 or l3 cache memory addresses but use instead placeholder locations and let the cpu/compiler try to decide where the cache address is. i.e. most c++ only optimized for multithreading not address lookup stability compiler code written using the c language default of pointers instead of actual memory location addressing. basically any half assed custom written in c++ app that you may find in a corporate enviornment like most individual crm client server based products.

etc...


RE: Not good
By Moishe on 12/5/2007 2:09:29 PM , Rating: 5
I really feel bad for them... and for us. We need competition. I'm not sure how, or why but AMD really has dropped the ball completely this past year.

I hope that behind the scenes they are working on a future CPU that is innovative and not just evolutionary.


RE: Not good
By SavagePotato on 12/5/2007 3:51:48 PM , Rating: 3
All the anti AMD detractors that like to put them down for the recent hardships will no doubt stop yapping if Intel CPU's hit the 1500 or 2000 dollar mark in the high end out of a lack of competition.


RE: Not good
By Eris23007 on 12/5/2007 6:33:48 PM , Rating: 2
The way I remember it, nobody wept for Intel back in the bad old Prescott days. Much the opposite, it was barely restrained glee that the big dog had gotten knocked off the top spot.

That's all fine and good, but why should those of us who care about CPU performance give AMD a break for their "hardships", which I would personally term "piss-poor execution"? They weren't unlucky - they made their bed, and now they have to lie in it. They chose to spend their money on the ATI acquisition instead of increased R&D budgets, and now they're living with that decision.

As Dr. Evil would say: "Boo frickin' hoo"


RE: Not good
By Moishe on 12/6/2007 7:53:35 AM , Rating: 2
Intel has never been the underdog and they DID take advantage of their lead to rest on their laurels. Nobody weeps for the guy who had everything and lost it due to his own complacency.

I think it's similar but honestly I think AMD tried (and failed). I truly don't believe this was a case of laziness on AMD's part. I could be wrong though.

Intel is the best for a reason. They have the money, the market, and the people to make the best processors and they've got nothing but themselves in their way. AMD has themselves, money, market and everything else in their way. For AMD winning is an up hill battle, it's simply not that way for Intel.


RE: Not good
By 1078feba on 12/6/2007 11:08:22 AM , Rating: 3
I agree with you on priciple, but, really...it sure seems like AMD has gone out of their way to make all sorts of just stupid mistakes.

The ATI acquisition, though a forward-looking and fundamentally sound decision, could not have possibly been more ill-timed. They had a really strong toe-hold on gaining market share at the expense of their biggest competitor and then inexplicably changed a winning game plan. Even with the knowledge that Larrabee is upcoming, they could have waited a few years to ensure that Barcelona would at least keep the market share they had gained, vice ending up in a price war that led to a new round of debt sales. Does anyone really think we would be discussing this L3 errata if all the ATI money had gone into Barcelona R&D?

I also have to wonder how much better ATI discrete cards would have been if that division hadn't been saddled with being the bread-winner in the family.

Why not just buy controlling interest in ATI, and then gradually keep buying more ATI stock until you get to the point that full acquisition doesn't require such a huge one time outlay of cash? That way, they could have had enough influence on ATI's chipset division to ensure that Spider would still become a reality, yet retain enough cash to properly fund K10 development. There was a wide variety of differing paths to take to get ATI, yet they chose the one, that while may have provided instant gratification, was also probably the most myopic.

I sure hope they get their act together. I so miss actually having "choice" when it comes to enthusiast procs. Let's face it, right now, the only difficult decision to make WRT buying a new proc is WHICH Intel fits your price/performace range.

...and this comes from a guy loving his 939 FX-60 (OC'd to 3.0, ;) )


RE: Not good
By DeepBlue1975 on 12/5/2007 2:24:04 PM , Rating: 3
They will be affected by the 10% performance penalty, which, adding to the already 10-20% lagging behind of AMD's Phenom compared to the actual Intel Q6600 becomes too much, even they loose any possibility of competing even in price / performance ratio.

Not good at all.


RE: Not good
By yost007 on 12/5/2007 5:23:10 PM , Rating: 2
I agree with your point. A "fix" that imparts a 10% penalty is not much of a fix.


RE: Not good
By Clauzii on 12/5/2007 5:51:27 PM , Rating: 2
I would say that "A fully slower but working CPU ALWAYS is better than NO CPU. Think about it ;)


RE: Not good
By kalak on 12/7/2007 1:45:09 PM , Rating: 2
quote:
A fully slower but working CPU ALWAYS is better than NO CPU


Err. NO. Not in this case. A fix that cause another problem ? That's not good. And in the world of CPU competition, 10% less speed is a doom... AMD need a better solution to this (though I think they don't care anymore...)


RE: Not good
By DeepBlue1975 on 12/8/2007 1:54:50 PM , Rating: 2
I'll answer your post with a question:

Why get a CPU that needs a 10% performance cap to work well, when you can go and buy a CPU that works right away with no problem at all, and even has a better price, and more performance out of the box (even when the Phenom is "unfixed"?

Intel didn't even need the penryn to compete in this conditions. Problem is, they are going to find that out really soon and slow their cycles and start making their CPUs more expensive.


RE: Not good
By Master Kenobi (blog) on 12/5/2007 6:17:25 PM , Rating: 2
In the industry this would be classified as a "workaround" since the "problem" still exists and has not yet been corrected.

The Caveat to this "workaround" by AMD, is they are going to get further thrashed in benchmarks.....


"It looks like the iPhone 4 might be their Vista, and I'm okay with that." -- Microsoft COO Kevin Turner

Related Articles













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki