backtop


Print 69 comment(s) - last by Clauzii.. on Feb 6 at 1:52 PM

Engineers blame simulation for quad-core "showstopper"

More than a few people noticed Intel's roadmap originally slated 45nm Penryn desktop quad-core processors for January, only to have the company change the hard launch date to a not-so-firm "Q1 2008." So what happened?  In a series of interviews, the tale of quad-core Penryn began to unfold. 

Processor engineers, speaking on background, detailed the problem. "Intel is very sensitive to mean time to failures.  During a simulation, at high clock frequencies, engineers noticed an increase of potential failures after a designated amount of time."

He continues, "This is not acceptable for desktop customers that require longterm stability. It's a showstopper."

Previous reports of errata degrading the L2 and L3 cache performance were described as "false" -- desktop Penryn processors do not even have L3 cache. Microcode and BIOS updates issued by Intel since November do not fix or address the "showstopper" bug affecting the launch of the quad-core Q9300, Q9450 and Q9550 processors

The condition does not affect Xeon quad-core processors.  Xeon uses a different stepping than the quad-core processors, which fixes this simulated condition.  The quad-core 45nm Extreme Edition processor launched in November is also unaffected.

The company would not detail when the processors, originally scheduled for a January 20 launch but announced at CES last week, will see the light of day. Conservative estimates from ASUS and Gigabyte put the re-launch sometime in February.  Intel completely removed its January 20 launch from its December 2007 roadmap and has not issued a new roadmap since. 

Intel spokesman Dan Snyder says more. "We publicly claimed we will launch its 45nm mainstream processors in Q1 2008, and that's exactly what we did."  In fact, the company announced 16 new 45nm processors last week; most of which already shipped to manufacturers -- with the exception of the quad-core desktop variants affected by the showstopper simulation bug.

Taiwanese media was quick to pin the simulated problem on complacency and lack of competition from AMD.  Intel employees quickly denied the allegation, with the additional claim that the report was "humorous." 

At CES last week, Snyder elaborates.  "The tick-tock model prevents Intel from missing its launch dates.  If the 'tock' team misses a target date, it doesn't affect the 'tick' team."

Tick-tock, the strategy of alternating cycles of architecture change and process shrink, became official company policy on  January 1, 2006. 

As to why the new Macbook Airs still use the 65nm Core 2 Duo processors? Even after Foxconn alluded the new notebooks would get 45nm treatment?  Another Intel spokesman declined to respond, only stating, "Our partners are free to choose any of Intel's currently supported processors."  Anand Shimpi explores this more.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

I call BS
By retrospooty on 1/16/2008 10:54:00 AM , Rating: 2
So there is a bug that effects the lower binned Quad core CPU's (Q9300 Q9450 Q9550) at high speed tests, but the high end QX9650 if fine?

Thats a load of crap... Its about no competition and maximizing profits. Its perfectly within Intels right to do that, and they can afford to because AMD stumbled again, but lets call it what it is.




RE: I call BS
By TomZ on 1/16/2008 10:59:06 AM , Rating: 5
I agree, the story sounds suspicious to me as well. My understanding is that these parts all share the same die but are binned based on speed. Therefore, how would only the top-speed device avoid the MTBF issues with high clock speeds, as Intel says?


RE: I call BS
By KristopherKubicki (blog) on 1/16/2008 11:05:36 AM , Rating: 5
The Extreme Edition chip does not have the same MTTF parameters as the mainstream desktop chip, as it was explained to me. Probably because people who end up with EE chips (which to my understanding is pretty much reviewers) don't run the chip for more than a few weeks.


RE: I call BS
By TomZ on 1/16/2008 11:14:19 AM , Rating: 2
I'm not sure I understand that. I've been running a last-gen EE processor for 2 years now, and I would expect the current generation EE processors to last more than a few weeks, since they are being sold with a 3-year warranty IIRC.


RE: I call BS
By KristopherKubicki (blog) on 1/16/2008 11:25:04 AM , Rating: 3
Right, but as it was explained to me the tolerances aren't as strict for EE processors.


RE: I call BS
By cochy on 1/16/2008 11:31:25 AM , Rating: 3
That seems a little counter-intuitive. If I pay over $1000 for a CPU I expect it to perform and survive longer than cheaper models. I am expecting the best chip of the lot.

No?

It all seems rather fishy. Maybe this little theoretical "bug" does exist but they are making a bigger deal of it than they would ever have because of AMD, and using it as an excuse.

I'm waiting to upgrade to one of these chips, so I hope I don't have to wait long.


RE: I call BS
By Adonlude on 1/16/2008 2:20:42 PM , Rating: 5
You guys should note that in just over a month Intel's stock price fell from $28 to $20. They also just reported dissapointing earnings and guidance. It is highly unlikely that they would be dragging their feet under these circumstances regardless of AMD's situation. Intel has unhappy stock holders to answer to.

You guys have no idea what this simulation issue is about yet you instantly assume that Intel has bad intentions and is being the quintessential consumer screwing evil giant.


RE: I call BS
By chsh1ca on 1/17/2008 11:48:27 AM , Rating: 2
Not necessarily. It's not like Intel hasn't dragged its feet due to a lack of competition in the past. The slacking off wouldn't have to be malicious, assuming you belive that's actually what went on.


RE: I call BS
By Clauzii on 2/6/2008 1:52:08 PM , Rating: 2
So in the middle of a deep fall in stock-value, You'd suggest Intel would make something like this - on purpose?

WOW!?!


RE: I call BS
By TomZ on 1/16/2008 11:47:12 AM , Rating: 2
quote:
Right, but as it was explained to me the tolerances aren't as strict for EE processors.

What tolerances do you mean? Do you mean MTTF tolerances? I'm not sure I understand what that would mean in this case. Are you saying that the specified MTTF for an EE is less than for a mainstream processor?


RE: I call BS
By killerroach on 1/16/2008 11:56:57 AM , Rating: 2
In short, yes.

Personally, I think Intel's strategy here is "it's overclockers who buy the EE, so, if it suddenly fails, they don't know what to blame it on."

Either way, it's fishy.


RE: I call BS
By ImSpartacus on 1/16/2008 12:45:43 PM , Rating: 3
I have seen a surprising amount of gamers that run those EE's stock from dell boxes, not understanding overlocking at all (its a pity really, they could save hundreds of dollars).

However that does make a little more sense.


RE: I call BS
By ImSpartacus on 1/16/2008 12:42:57 PM , Rating: 2
I hate to shoot the messenger boy, but regardless of what 'tolerances' are lessened, it should be the opposite.

The EE should be cherry picked, not the mid range. It makes no sense.


RE: I call BS
By halcyon on 1/16/2008 1:41:04 PM , Rating: 5
Right. This is good to know.

Memo to myself:

Intel will sell the worst MTBF parts to the highest paying customers. Do not pay a lot for Intel products, in order to ensure lower failure rates.

I'm sure Intel PR would love to get this piece of information all over the newswire :)

PS I appreciate you reporting things as you heard them. Still, I agree with the other posters. This is utter and complete bollocks on Intel's part. They postponed it, because they want to make more money, but they downplayed it as a "we want to ensure highest quality for our customers". Just a pity they didn't think their white lie it all the way through...


RE: I call BS
By Amiga500 on 1/16/2008 11:31:48 AM , Rating: 2
You've been running for 2 years 24 hrs a day, 7 days a week?

What Kris means is mission critical stuff, like servers and workstations.

I can't afford to have a CPU throw a wobbly and wreck the prior week or 2 of processing work (jobs can take much longer than that) - if it started happening in other engineering companies Intel would not be flavour of the month!


RE: I call BS
By Amiga500 on 1/16/2008 11:32:41 AM , Rating: 2
Or have I read that wrong.

Is failures in this sense a crash, or a physical breakage of the CPU?


RE: I call BS
By KristopherKubicki (blog) on 1/16/2008 11:38:04 AM , Rating: 2
Intel would not elaborate on what the symptoms would be for this bug. This is probably because they haven't actually replicated it outside of a simulation.

I'm guessing just one day you try to POST and it wont.

But you're right on about mission critical systems and stuff. You can't buy 10k EEs at a time for all the machines at your work -- and you wouldn't want to. The tolerances are tighter and the clocks more conservative on the mainstream chips.


RE: I call BS
By Mitch101 on 1/16/2008 1:34:44 PM , Rating: 5
LOL. Sounds like you are running into the circular discussions I ran into when we posted that yorkfield had a bug.

We know there is a bug that caused a delay its not the end of the world and will be corrected. Just have to wait a little longer for the chip.

The question is
Does the bug really matter if the chips containing it aren't released?

This isn't like the Phenom that is out in the wild that requires a bios update to ensure it doesn't run into the issue. Errata happens.

Had Intel released the chips into the wild and the bug happens then its big news. Till then its just a minor delay.


RE: I call BS
By halcyon on 1/16/2008 1:43:53 PM , Rating: 2
You do realize, that people buy Xeons for 24-7 mission critical servers, not desktop/workstation CPUs, do you not?

The whole excuse sound really quickly made up lie to me.


RE: I call BS
By Amiga500 on 1/16/2008 6:13:51 PM , Rating: 3
To swing that right around.

You do realize not everyone buys Xeons for 24-7 mission critical workstation CPUs don't you? (servers I will accept are usually Xeon CPUs)

Intel do not want crashes occurring in this type of heavy workload - simply because of the demographic it will be affecting. 1 crash is 1 too many - same for AMD and phenom, which is why they took the hit on performance to get reliability.


RE: I call BS
By Khato on 1/16/2008 12:05:05 PM , Rating: 3
Eh, I prefer the explanation from PC watch a week or so ago - there are stability problems for 45 nm quad core processors on 4 layer motherboards. Specifically, that the FSB gets a tad bit too noisy (quads have to share the FSB between two die after all.)

It lines up purrfectly with this article no less - just without the silly notion that the EE has lesser MTTF parameters than the mainstream. The EE simply has stricter motherboard requirements as far as Intel's concerned, which makes this issue disappear.

Do note that this is an actual issue, either with the IO buffers or something with packaging. But it's only an issue because it's a good idea to make certain these mainstream processors will run in the cheapo motherboards that some are likely to put them in. (aka, OEM's would really complain if they had to step up to 6 layer motherboards.)


RE: I call BS
By James Holden on 1/16/2008 12:52:08 PM , Rating: 2
Even with Intel on record you prefer PC Watch's explanation?

Maybe PC Watch is right. But it seems pretty cut and dry when the whole company goes on record.

As for EE, did anyone stop and consider that maybe EE was a different stepping than the desktop chips? After all, its practically a Xeon anyway, and the Xeons aren't affected.


RE: I call BS
By TomZ on 1/16/2008 12:59:21 PM , Rating: 2
The problem is that Intel's so-called "explanation" doesn't really make any sense, as has been pointed out by a few other posters here.

Like, for example, why is Intel still running simulations against a processor which was supposed to release on the 20th? And a processor whose EE variant released a couple of months ago?

No, I think it makes more sense that there may be signal integrity problems on motherboards, and that the simulations are on the motherboards and not the processor itself. Just my guess with the limited information at this point.


RE: I call BS
By James Holden on 1/16/2008 1:44:14 PM , Rating: 2
What makes you think Intel is running these simulations right now, as opposed to months ago? Or that Intel even stops running simulations on its processors? Last I heard the Intel simulation cluster was about 5,000 servers -- they might as well be doing something?

Well, you can believe this signal integrity thing all you want. Intel denies it, a few random blogs speculate it, and this article pins it on another problem.

In the end, it pretty much doesn't matter which makes me wonder why Intel would lie about it? And not just clever wording -- this would be a bold faced lie.

My personal opinion? The EE is probably affected too. I didn't see Intel state anything about it in the article, and that was just Kris's speculation added in at the end.


RE: I call BS
By Khato on 1/17/2008 2:59:59 AM , Rating: 2
Well first, maybe 5,000 servers at one site =P Second, those are primarily for simulation of the design well before you actually have silicon to play around with. (That and processing the design for tapeout...) As you can probably guess from those statements, those servers are kept plentifully busy playing around with what's coming up next.

Sure, technically all the 45nm quad core processors would be affected by the issue. It's not something peculiar to the mainstream desktop ones, it's just that's the only place where it ends up happening when run at spec because the spec includes cheaper mainstream motherboards. It's a simple fact of high speed transmission lines that both the quality of the line -and- the termination play a role in signal integrity. The better quality of the line on 6 layer motherboards simply makes up for the slight slip on termination on the quad cores.

All of the above is speculation based upon playing around a tad with the other end of the link. The only people at Intel that really know what the issue would be is likely one silicon validation team, the corresponding IO/design team, and management going up. As the frequent posting of things from Circuit to The Inquirer goes to show, giving employees information that they needn't know just tends to get it leaked. Kinda humorous to hear more about project status from the various computer websites than from any internal information =P


RE: I call BS
By mindless1 on 1/16/2008 9:48:56 PM , Rating: 2
It's quite simple, a simulation of long-term effects would seek as much as possible to be ran over a long term. The idea that one can just make it run hot, or overvolt, isn't the same as just keeping it running and seeing what happens (possibly with either or both of the former also preconditions).

It does not seem likely to be a motherboard signal integrity problem at all.The EE would then be effected and rather than lose face about a, err, flaw, they'd issue a warning about updated motherboard guidelines.


RE: I call BS
By Khato on 1/17/2008 2:42:56 AM , Rating: 3
First, Intel isn't "on record" anywhere in the article really, it's not an official statement by any means. That, and what is said isn't mutually exclusive with the PC Watch article.

See, engineers do so love the little technicalities. Like that the Xeon is technically a different stepping than the core 2 versions, despite the actual silicon being the same (it's a packaging difference.) And then the technicality that motherboard support for core 2 duo is different than for a core 2 extreme... So on, so forth.

I by no means am trying to say that this isn't a circuits bug. It is. It's not Intel sandbagging. It's not an indication of something gone horribly wrong. It's just more noise than is tolerable on the FSB with 4-layer motherboards. (Dual core has one termination point, quad cores have two termination points, and 4-layer motherboards are inherently more noisy to begin with.)


RE: I call BS
By lindejos on 1/21/2008 1:09:49 PM , Rating: 3
Wait a minute, who went on record? According to the article some faceless process engineer, who allegedly works at Intel? Don't you think that if Intel was going to "go on record" there would be a press release?

The only thing that the company actually said on record is "We publicly claimed we will launch our 45nm mainstream processors in Q1 2008, and that's exactly what we did." That was said by Dan Snyder who is a company spokesman. Here's a link to an actual sourced quote: http://www.techreport.com/discussions.x/13756.

"On Record" has to be sourced to an actual person you can prove works for the company. Let's call this MTBF or Errata issue "speculation."


RE: I call BS
By mindless1 on 1/16/2008 10:00:19 PM , Rating: 2
Back up and consider what you're saying. While the same die, there are indeed potential differences that cause some to be binned for, only capable of lower speeds (or at least at same/similar voltage). Whether these differences also impact longevity of the core if ran under certain parameters may be an issue. It was stated "During a simulation, at high clock frequencies," so why would they need to test these high clock frequencies with a lower binned part anyway?

Maybe they're overvolting them, doing what they know a fair percentage of the enthusiast community will do and finding they don't hold up as well to that as former models did. A lower bin that needs vcore increase to reach speeds moreso than an upper binned part would then be more susceptible to damage merely because they're taking into account what the industry has expected, a certain margin and robustness in the design. It wouldn't look good at all if all the reviewers who got ahold of these had them go up in smoke, typically you see clear thermal issues or instability before something like that happens.


RE: I call BS
By defter on 1/16/2008 11:05:38 AM , Rating: 2
According to some rumours, this bug affected compatibility with some chipsets.

QX9650 and 45nm quad core Xeons are officially supported only by Intel's chipsets and thus they aren't affected.


RE: I call BS
By KristopherKubicki (blog) on 1/16/2008 11:08:49 AM , Rating: 2
I've seen this bug blamed on everything from L2 design to chipsets to ambient radiation. I grilled Intel on pretty much every angle I could come up with, and the simulated MTTF is pretty much the answer they kept coming back to.


RE: I call BS
By Oregonian2 on 1/16/2008 1:39:01 PM , Rating: 2
Is it possible that maybe that really is the problem?


RE: I call BS
By Mitch101 on 1/16/2008 2:10:35 PM , Rating: 3
Kristopher,

We were hearing its related to the 1600mhz FSB.

The best reasoning we heard was Intel wants the 1600Mhz FSB to prevent overclockers as much as possible. If they use the 1333FSB they have to obviously increase the multiplier allowed making all the chips great overclockers. Basically drop the multiplier and increase the FSB up easy enough since mobo's chips have a good amount of overhead. If they are able to sell the chips at 1600mhz FSB this will limit overclocking because the chips are running near top FSB speed and can lock the multiplier lower. But not all the chips run 100% stable over time at this speed. If you look at the 45nm chips even the low end is a very good overclocker because of the FSB overhead. If Intel can use the higher FSB and lock the multiplier lower then more people would be required to purchase the higher cost chips to really get higher speed. It also allows Intel to sell 1600Mhz certified mobos with their new chipsets. Its a two sell approach your buy a new chip and to take advantage of it you need to buy a new mobo. We know FSB can go higher but it may not be 100% stable for server use with lousy cooling for them to release the chips.

Like you we've heard numerous reasons. One where hot spotting is causing the chips poof.

But who knows what the truth is.


RE: I call BS
By eye smite on 1/16/2008 1:41:05 PM , Rating: 2
I'm certainly glad to see everyone raising as much hell about this as they did the errata in phenom. Will it snowball and get worse like the reviews and reports did with phenom, I doubt it. Everyone believes intel can do no wrong and amd is the red headed step child. Where the real tripping step is starts with everyones expectations and perceptions. So what if intel had some errata. So what if amd had some errata. Companies hit stumbling blocks, just wait for them to resolve it and mature the product. Ya know, like Ford did with the tip over easy Explorer.........


RE: I call BS
By TomZ on 1/16/2008 1:56:45 PM , Rating: 1
quote:
Everyone believes intel can do no wrong and amd is the red headed step child.

I disagree. While Intel has been performing very well for the past couple of years, there are enough well-informed people analyzing each step they make, to where if they make even any small mistakes, they're going to get called on it.

Also remember, web sites like DT exist at least in part to capitalize on the interest and publicity surrounding missteps by big tech companies. So you can be sure they'll be published here and elsewhere.


RE: I call BS
By eye smite on 1/16/2008 2:08:19 PM , Rating: 2
That still doesn't change the problem of everyone's expectations and perceptions.....


RE: I call BS
By DigitalFreak on 1/16/08, Rating: -1
RE: I call BS
By edborden on 1/17/2008 6:29:58 PM , Rating: 1
How can you draw a comparison when AMD shipped everyone product that HAD the errata, while Intel is claiming to HOLD BACK product for the same reason.


RE: I call BS
By mindless1 on 1/21/2008 11:29:11 PM , Rating: 2
We can draw a comparison because Intel has in the past also shipped product with errata, it's all a matter of whether (either) company catches the problem in time to stop shipments or not. Don't think it won't ever happen again from either camp.


"If you mod me down, I will become more insightful than you can possibly imagine." -- Slashdot

Related Articles













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki