backtop


Print E-mail del.icio.us 52 comment(s) - last by Clauzii.. on Oct 1 at 6:52 AM

Power consumption and performance explored

DailyTech has managed to snag an Intel Kentsfield Core 2 Quadro Q6600 for some in-house testing. The Kentsfield Core 2 Quadro Q6600 is clocked at 2.4 GHz with a 1066 MHz front-side bus. It’s equipped with 8MB of total L2 cache. Unlike Intel’s Conroe Core 2 Duo processors, the cache configuration of the Core 2 Quadro Q6600 is 2x4MB with each set of dual-cores sharing a single 4MB pool of L2 cache. This is because Kentsfield processors are essentially two Conroe dies fused together to form a single processor—similar to how the original Smithfield Pentium D 800 series was.

Strangely the Kentsfield Core 2 Quadro Q6600 did not support Intel’s Enhanced Speedstep Technology. Whether or not this is a result of an early engineering sample is unknown at the moment. The Core 2 Quadro Q6600 does support a C1E Halt state for decreased power consumption. Speaking of power consumption, Intel has done an excellent job optimizing power consumption for its quad-core Kentsfield.

Our power consumption measurements were conducted using a Kill-A-Watt power meter that measures the power draw of the complete system from the wall outlet. The test system consisted of:
ASUS P5W DH Deluxe
Kingston HyperX DDR2-800 2x1GB
ATI Radeon X1900XT 512MB
Creative Labs Sound Blaster X-Fi Xtreme Music
Silverstone ST60F 600 watt power supply
Seagate 7200.8 300GB
Windows XP Professional
Intel Kentsfield Power Consumption

Watts
Core 2 Extreme
X6800
Core 2 Quadro
Q6600
Idle 154 198
Load 202 223

Power consumption compared to Intel’s current flagship Core 2 Extreme X6800 isn’t too bad at idle with the Core 2 Quadro Q6600 consuming 44 more watts. The higher power consumption is due to the Core 2 Quadro Q6600 lacking Intel’s Enhanced Speedstep Technology that lowers the clock speed of the processor during idle.

Power consumption under a load of 3D Studio Max 8 rendering a complex model with all four cores utilized is quite good. A total of 223 watts was drawn from the wall with the Core 2 Quadro Q6600 under load. This is quite low when compared to the Core 2 Extreme X6800 that draws around 202 watts under the same condition. We were quite surprised Intel has managed to keep power consumption relatively low with four cores.

We were able to run a few quick benchmarks with the Core 2 Quadro Q6600 as well. For comparison purposes we have a benchmarks from a Core 2 Extreme X6800 and Core 2 Duo E6600 as a reference. A real Core 2 Duo E6600 wasn’t readily available for testing so we dropped the multiplier of the Core 2 Extreme X6800 down to 9x. Performance of a real Core 2 Duo E6600 and our Core 2 Extreme X6800 clocked down to E6600 speeds should be identical.

 SiSoft Sandra 2007 CPU-Arithmetic
MIPS

Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
ALU
27025
22261 44522
FPU
18641
15257 30513


 SiSoft Sandra 2007 CPU Multimedia
MIPS

Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
ALU
161548
132140 263973
FPU
87561
71619 143084


Performance is as expected with SiSoft Sandra. Clock for clock the Core 2 Quadro Q6600 has nearly twice the raw performance of the Core 2 Duo E6600. This isn’t too surprising as the Core 2 Quadro Q6600 has twice as many cores.

3D Studio Max 8 Performance
Time

Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
Min:Sec
16:45 20:20 11:00


 Cinebench 9.5 Performance
Time

Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
Seconds
24
30 17

3D Studio Max 8 scales very well with four cores, as expected. Cinebench 9.5 shows similar gains as well. The Core 2 Quadro Q6600 shows a near 2x performance increase compared to the Core 2 Duo E6600. This is quite expected, as 3D rendering applications will use all available processing power.

 Windows Media Encoder 9
Time

Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
Seconds
59
72 45


 TMPG Encoder
Time

Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
Seconds
407
486 289

Multimedia encoding performance shows modest gains considering the number of threads have doubled. This is most likely due to Windows Media Encoder 9 and TMPG Enc’s lack of multi-core optimizations. TMPG Enc still detects multi-core processors as a single Hyper Threading equipped processor and not using all four processor cores. Hopefully Microsoft and Pegasys Inc. will release updated multi-core aware versions of its applications in the future.

 Quake 4


Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
FPS
86.37
73.5 75.9


 Serious Sam II


Core 2 Extreme
X6800
Core 2 Duo
E6600
Core 2 Quadro
Q6600
FPS
203.83
176.2 174

Gaming performance is as expected of quad-core. It offers nearly the same performance clock for clock as its dual core counterparts. Quake 4 with the latest patch is supposedly optimized for multi-threading shows minimal performance gains. The higher clocked Core 2 Extreme X6800 still manages to beat out the Core 2 Quadro Q6600 though. Games that aren’t multi-threaded such as Serious Sam II shows little to no difference in performance.

Overall Intel’s Kentsfield performs as expected. It will scale very well in multi-threaded applications such as 3D Studio Max, Cinebench and other 3D modeling applications or encoding applications. Unfortunately, unless the application is multi-core aware or optimized for multi-threading the performance gains are minimal if not absent. While the move to quad-core hardware may be exciting, software support is still trailing behind. Although Intel positions its quad-core Kentsfield Core 2 processors as a high-end part, the soon to be released Kentsfield Core 2 Extreme QX6700 and Core 2 Quadro Q6600 appear to be a better mid-range workstation part rather than enthusiast gamer part—especially since there’s very little overlap with the Intel Xeon 3200 series.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Nothing unexpected
By dcalfine on 9/25/2006 7:59:37 PM , Rating: 2
These results are very good in multithreaded applications (though lacking in games), which was to be expected. I look forward to clovertown benchmarks, with the 1333MHz bus.

Overall, this is good news and I think Intel has really done a good job and has redeemed itself for the crappy processors it's had over the years.




RE: Nothing unexpected
By chuck232 on 9/25/2006 9:25:34 PM , Rating: 2
I'm sort of surprised that the results are pretty decent. There was a heck of a lot of talk about the FSB severely limiting the Intel quad core procs.

Then again, this is a pretty limited set of applications so we may yet see some issues when these are more 'widely available' for testers at least.


RE: Nothing unexpected
By JeffDM on 9/25/2006 11:46:38 PM , Rating: 3
FSB seems to be a poorly understood factor. These chips have a very large cache, so even if the FSB is "full", it doesn't really hurt computation as much as you might think it would.


RE: Nothing unexpected
By JumpingJack on 9/26/2006 2:37:52 AM , Rating: 2
quote:
FSB seems to be a poorly understood factor. These chips have a very large cache, so even if the FSB is "full", it doesn't really hurt computation as much as you might think it would.


Correct. Large cache and good prefetchers lessen the demand on the bus. It is not a 'misunderstanding' it is simply a false assumption. Many assume the FSB to be the achille's heel simply because it is old technology with a periodic increase in clock speed. This is an invalid assumption as architecturally the CPU design determines the demands needed and, if FSB is good enough then it is good enough. Performance will flow.

If you really want to know how far the FSB has to go, grab a C2D and a copy of Intel's Vtune and measure the demand on the FSB --- I did -- FEAR requires about 15% BW, Quake 4 SMP on requires about 21 % BW, 3DS Max about 8% BW based on a 1067 MHz bus. Run the experiment yourself, it is easy. Setup Vtune to monitor the amount of time the Bus-Busy line goes high.


RE: Nothing unexpected
By JumpingJack on 9/26/2006 2:40:06 AM , Rating: 2
Oh BTW --- becareful with Vtune, I found it quirky -- I was able to get about 5 or 6 runs with it on different apps before it hosed up my system files and I had to rebuild.


RE: Nothing unexpected
By thomasxstewart on 9/26/2006 12:06:33 PM , Rating: 2
Faster FSB is better, your home calculations are true, however, much time is spent in rest states, that is, running yet not for any particular purpose. so when you see 20% less lag time moving from 1066 to 1333 mhz/sec FSB, each string is really getting to operations state much faster, not that 80% of time fsb is idling without tasks to sort. Basicly ,its how fast each kernel can be deciphered, dispatched & dispensed with, not waiting time, which always is most in any system. Faster linking in itself brings on more lag time as each job is completed sooner.
Signed:PHYSICIAN THOMAS STEWART VON DRASHEK M.D.


RE: Nothing unexpected
By JumpingJack on 9/27/2006 1:14:31 AM , Rating: 2
In order to avoid a forum type post, I will not go into any more after this. However, what you describe above is not true. Each clock tick is a transaction, the width of such transaction is measured in 'seconds', the speed at which data transverses from point to point is actually irrelevant. If the bus signal lines are too long, excess latency is introduced -- you will often read of actual 'physical separation is far', this is just simple physics. Your view of how the FSB works so long as you include both transactions/sec as well as latency in your argument.

Bandwidth, simply stated, is the total amount of data that can be pushed from point A to point B in X amount of time. If a 1067 bus has a theoritical BW of 8.5 GB/sec, but any given application demands only 2 GB/sec of data, at the core speed of the processor then there is room to spare.

Of course more bus speed is desirable in workloads that demand large chunks of data to be moved across the bus often. In servers this is a big deal, in desktop there are very few apps that actually stress the bus to these levels.

In short, some apps would run fine using only a 400 MHz bus, others may bog down but run fine with a 800 MHz bus.

You're argument is that at 400 MHz electrons move slower through the signal lines than at 800 MHz or 1067 MHz, and this is not true as 'speed' at which electrons travel is not determined by frequency, but by voltage.

To illustrate my point, go read Tom's Hardware article on the recent Kentsfield quad core benchmarks, They did a 2.67 GHz Quad at 1067 FSB and again 2.67 GHz Quad at 1333 MHz FSB --- guess what, performance was identical.


RE: Nothing unexpected