backtop


Print E-mail del.icio.us 60 comment(s) - last by wetwareinterfa.. on Dec 14 at 12:46 AM

Two whitepapers published by AMD raise questions about its new thermal rating; an inconsistency AMD justifies with margin calculations

AMD's Average CPU Power rating scale underwent its first test of scrutiny yesterday when two nearly-identical whitepapers discussing ACP caught the attention of DailyTech and other technology forums. 

The inconsistencies between these two whitepapers included a table where the Thermal Design Power of AMD's new quad-core Opterons increased, without an increase in the Average CPU Power rating.

Several processor architectures ago, AMD and rival Intel used the same methods for calculating Thermal Design Power with regard to microprocessors.   From an engineering standpoint, the TDP represents the amount of power the cooling mechanism for the CPU must dissipate before failure.

AMD and Intel now differ with TDP calculations, and for different reasons.  Intel's current architecture, for example, allows the CPU to exceed the TDP rating for a small period of time before the processor throttles its frequency clock in order to reduce the temperature at the processor level.  AMD's current-generation processors do not practice this method, and thus AMD intentionally publishes conservative TDP ratings.

AMD's 2008 Phenom roadmap clearly illustrates the increase in TDP for upcoming K10 processors. Even though Phenom processors are desktop units, equivalently clocked Barcelona Opteron processors share almost all the same attributes and specifications.

AMD's Brent Kerby, author of both whitepapers, explains the inconsistency with timing and the nature of ACP itself.  "The measured value of ACP already included the changed TDP values," he explains. 

So are the ACP measurements represented in both documents wrong, or mutually exclusive? "No," says Kerby.  Even though the maximum thermal envelope has increased by as much as 21% between the two whitepapers, the algorithm for calculating the ACP is not entirely affected by this upperbound.

"When we published the first whitepaper, we had to anticipate TDP changes," adds Kerby.  The newest whitepaper, Kerby states, is relevant to both published TDPs.  The TDP references in the first document can be replaced with the TDP changes from the second, and in fact should have been.

Kerby could not comment on the specific margins built into the ACP rating, but mentions it's an integral part of the ACP process.  His whitepaper details the margin as follows:
The results across the suite of workloads are used to derive the ACP number. The ACP value for each processor power band is representative of the geometric mean for the entire suite of benchmark applications plus a margin based on AMD historical manufacturing experience.
However, Kerby could confirm that if the TDP were to increase again, the company's ACP values would need to be recalculated. 

Will AMD's thermal envelope increase again?  AMD won't say, but at least for now AMD's ACP/TDP mystery appears solved.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Change title to "WE LIED ABOUT AMD"
By TheJian on 12/10/2007 10:55:47 PM , Rating: 5
With Intel doing the same thing for ages this story shouldn't even have been printed and your LIE shouldn't have been told. Talk about irresponsible reporting. As long as we know BOTH TDP and ACP numbers who cares? Intel has been lying about theirs with avg numbers for years (I'm reminded of the stink of Dell Optiplex SFF machines with P4's above 2.4ghz...LOL. Presshots anyone?). Why is it a story when AMD does the same?

http://techreport.com/articles.x/13176
Read "The nuts and bolts of the quad-core Opterons" section.

As you can see they sound like they're saying it's about time AMD did this. They even say they like it but don't care because they'd test the chips themselves any way. Which is exactly what all review sites should do. They call it "justifiable". "This move may be controversial, but personally, I think it's probably justifiable given the power draw profiles we've seen from Opterons." Is there a problem with them wanting to be compared accurately to their rival? Most websites totally leave out the differences between Intel's TDP version and AMD's. What you get is AMD's are higher and must suck. I'm not saying they're better than Intel (I just bought a core2 myself a few months back thanks to 1.8's running at 3ghz on air so easily - and mine runs cool even without my koolance...LOL) but review sites have been doing this for ages. I don't blame them for wanting to fix this problem especially with their profits tanked. They should have done this 4 years ago. Instead their stupid marketing dept decided it was best to tell people the max TDP of any cpu that they might not ever produce in a particular product family. When they said 89w for A64's a few revs back it was for the whole family. Don't tell me you think a X2 3800+ is the same as a X2 5400+. Yet all of these were quoted at 89w. Look at Anand's chart here:
http://www.anandtech.com/cpuchipsets/showdoc.aspx?...

All AM2 chips show 89w. I've always thought they were stupid for allowing this. Review sites didn't help by making people believe they put off 89w. Usually forgetting to say it's so far off its ridiculous. AMD shot themselves in the foot by starting this MAX crap NOT by fixing it now. Why is it a LIE to tell the truth about their cpus? It's quite clear they LIED before by saying they were all 89w when they were really more like 40-89w. My old 3800x2 was cold with retail heatsink at 2.4ghz (400mhz oc per core, no heat). Their marketing dept should be fired. YES (and Ruiz too...Bring back Jerry!). But AMD correcting their marketing mistakes should not be attacked. Your previous story was the equivalent of me saying I saw you murder someone and I get you convicted with no proof. The following day (after the damage is done for anyone who doesn't come back to read a correction story) DNA proves you innocent and you get off. Don't I owe someone an apology here? You called them outright liars in the title. You even stated in the article that it may be a mistake in their docs. Isn't liar a bit conclusive knowing you might be completely off base? Is tomorrows article going to be called "The LIES START WITH DAILYTECH!"??? BTW core2 rocks and AMD is getting KILLED. Fanboys from both sides can just jump off a cliff :) That doesn't change the fact that AMD is having their chops busted for nothing here.




RE: Change title to "WE LIED ABOUT AMD"
By TheJian on 12/11/07, Rating: -1
RE: Change title to "WE LIED ABOUT AMD"
By JumpingJack on 12/11/2007 12:46:19 AM , Rating: 2
You're ranting.... they did not lie .. AMD published two specs in conflict, and the explanation still does not make sense.


RE: Change title to "WE LIED ABOUT AMD"
By bangmal on 12/11/07, Rating: -1
RE: Change title to "WE LIED ABOUT AMD"
By JumpingJack on 12/11/2007 1:52:18 AM , Rating: 3
I didn't call him a liar... I said he was ranting. What I pointed out was DT's original article was not a lie (as he was calling DT liars) ... DT pointed to a discrepancy in AMD's APC whitepaper, one version paired up one set of numbers, another version paired up a second set. TDP went up but APC stayed the same... this appears weird.

The follow up article, this one, is less than satisfactory explanation from AMD -- it is like that L2 latency argument for Brisbane.

Either a) their original spec was wrong or b) the APC is not using loads representative of the worst case, as TDP should encompass the max (AMD's responder calls this the upper bound).

5 apps with a geometric mean is not sufficient to capture all the variation, so it appears they are wiggling up a higher thermal envelop but keeping their marketing ACP number the same.

Look, a population of applications will drive a variety of thermal profiles as well as process variation from processor or to processor -- condensing this down into a 5 app geometric mean (which is not an average as you would typically understand average) is not sufficient. And if AMD adjusts the upper bound for what ever reason, then it is necessary to re-evaulate the ACP... but re-evaluation is especially if they are adjusting the upper bound to accompany higher operating voltages and consequently currents (which is how they arrive at Pmax/TDP per their own argument) in order to expand the clocking potential.

In short, I am not buying AMD's explanation above.


RE: Change title to "WE LIED ABOUT AMD"
By Viditor on 12/11/2007 2:31:10 AM , Rating: 3
quote:
the APC is not using loads representative of the worst case

I think that's where you went wrong...it says in the whitepaper that APC represents typical useage, not worst case...


RE: Change title to "WE LIED ABOUT AMD"
By Viditor on 12/11/2007 2:34:33 AM , Rating: 3
It states under Test Conditions:
" Given the goal of representing typical power usage in
real world conditions
, environmental test conditions were
chosen to refl ect that aspect (room temp of 70ºF, server’s
fan heat sink used, closed case, etc.) The power for the
cores, memory controller, and HyperTransport™ links was
logged multiple times per second throughout the entire
duration of the workload tested, and the time-averaged
power consumption for that workload was calculated."


RE: Change title to "WE LIED ABOUT AMD"
By JumpingJack on 12/11/2007 2:44:20 AM , Rating: 2
They actually only used 5 workloads, quoted from the ACP WP:

quote:
These workloads were Transaction
Processing Performance Council (TPC-C), SPECcpu2006,
SPECjbb2005, and STREAM. The geometric mean of
measurements, taken during these workloads, is the ACP.


From these 5 apps, the measure the power consumed for each one as you described to get a single number... they then calculate a geometric mean which will be less than (or equal to -- unlikely) the arithmetic mean.


By tmouse on 12/11/2007 8:07:43 AM , Rating: 1
Is the distribution normal? (I do not know) If it is not then a geometric mean is the proper one to use as the arithmetric mean is meanlingless outside of a normal distribution.


RE: Change title to "WE LIED ABOUT AMD"
By JumpingJack on 12/11/2007 2:48:55 AM , Rating: 3
No I am not wrong, if the upper bound -- dictated by worst case forced AMD to change this spec and APC did not change with it, then the apps used to stress the process in assimilating ACP are not representative the entire spread or distribution of available workloads.

AMD, by matter of choice since this is their metric and not a standard, can choose what ever they like to create this value. So they chose 5 apps, all server related synthetics ... this is fine, but all things being equal, if the upper bound moved one would expect the geometric mean to shift unless they did not choose an appropriate subset of loads to capture the worst case.


RE: Change title to "WE LIED ABOUT AMD"
By Viditor on 12/11/2007 2:55:20 AM , Rating: 1
quote:
if the upper bound -- dictated by worst case forced AMD to change this spec and APC did not change with it, then the apps used to stress the process in assimilating ACP are not representative the entire spread or distribution of available workloads


You're comparing an upper bound theoretical limit (TDP) to a real world measurement (ACP). Movement of one does not necessitate movement of the other.
Again, you are assuming that ACP is meant to capture worst case instead of typical useage...


RE: Change title to "WE LIED ABOUT AMD"
By Viditor on 12/11/2007 3:00:33 AM , Rating: 2
Let me be clearer...

You're assuming that the theoretical geometric mean will be the same as the tested geometric mean...and that ain't always the case.


RE: Change title to "WE LIED ABOUT AMD"
By JumpingJack on 12/11/2007 3:29:29 AM , Rating: 3
Now this does not make any sense :) ... sorry, maybe you could be more clear :) ...

You seem fixated on the word theoretical... TDP is not theoretical, it is a spec indended to ensure that the cooling solution is sufficient, you choose it based on what you know or measure about the processor to ensure the processor will work.... AMD MEASURES to arrive at TDP, there is nothing theoretical about it. It's in their specs.


RE: Change title to "WE LIED ABOUT AMD"
By Viditor on 12/11/2007 4:57:37 AM , Rating: 2
quote:
AMD MEASURES to arrive at TDP, there is nothing theoretical about it

AMD doesn't run apps at that rate...I doubt there is any app short of a thermal virus that could do so.
As in most good theories, the premise is based on measured facts.
However, since the practice never actually takes place, it must remain theoretical.

Intel actually runs their apps and measures the outcome, so it is a measurement...

ACP is also actually run, so again it is a measurement...


RE: Change title to "WE LIED ABOUT AMD"
By JumpingJack on 12/11/2007 11:56:56 AM , Rating: 2
Well, I quoted AMD's spec... which states TDP is measured at these conditions... how they load it to arrive at that measure (prodding the CPU to max conditions) is ambiguous at best.


RE: Change title to "WE LIED ABOUT AMD"
By alanore on 12/11/2007 12:54:11 PM , Rating: 2
Since Rev E. AMD has placed the individual chips TDP on the chip, if you own a chip since Rev E. You can use TCaseMax to read your individual chips TDP, you'll find that its less that the TDP that your chips was designed for, and in some case well off it. ie if you have chips ranging from 2.0GHz to 2.8GHz all in the 89Watt envelop it is not very logical to think that the 2.8GHz chip will consume the same amount of power as the 2.0GHz chip.

All the TDP rating is the envelop in which the processor is. If you sample 50 phenom processors for the ACP, and then say, these all fall in the 95watt envelope but were going to scale up the clockspeed, knowing the upscaled clock speed will fall out of the envelop, so you then up the envelp to 120watts, the 50 processors you sampled aren't using any more power, its just you could make chips that do.

Second AMD is atleast calling theirs something else so it isn't confused with TDP, Intel just give its rating as the TDP even though its not. At this point it worth noting, because AMD chips have elements of the north bridge on them it increase there TDP but the systems TDP isn't increased.

The theoretical limit to the use of your brain is 100%, but the body just can't produce enough energy and oxygen to allow this to happen so it only operates at a fraction. Realistically the brain would only operate at higher percentages when something was hugely wrong. But say at 100% the energy it consumed was called the TDP, now if you listen to the TPD you would have to consume 8000 calories a day to provide it with energy. In reality we don't use all our brains so that intake of calories is overkill. Same with actual TPD the more core you have the further from the TDP you get, so there no point in the overkill.


By JumpingJack on 12/11/2007 10:57:43 PM , Rating: 2
I understand what you are saying, but this is off point.... you are talking about binning and where that falls on the power curve, this is different from measuring what the TDP should be to define the cut off for a processor operating at nominal within that bin.