backtop


Print 64 comment(s) - last by Parhel.. on Aug 27 at 3:52 PM


CPU ID screen shot of "Yorkfield" at 2.33 GHz  (Source: DailyTech)

Intel "Yorkfield" ScienceMark 2 L2 cache performance  (Source: DailyTech)
DailyTech managed to snag a quad-core "Yorkfield" for a few quick benchmarks

Benchmarks of Intel’s Penryn based dual-core Wolfdale have appeared a couple times in the past month. The early benchmarks tested engineering sample processors and showed Wolfdale, on average, performing 5 percent faster, clock for clock then Conroe. However, benchmarks of the quad-core Yorkfield are virtually non-existent to the public.

Intel’s Yorkfield is not a native quad-core design. As with Kentsfield, Yorkfield features two dual-core dies fused together. The design results in each pair of cores having access to its own pool of shared L2 cache. Since Penryn has more cache, each pair of cores has access to 6MB of L2 for a total of 12MB – up from the 4MB per pair and 8MB total of Kentsfield.

In addition to the increased cache size, Penryn features a faster 24-way associative L2 cache, which cuts off a few clock cycles. Kentsfield has an 16-way associative L2 cache.

Penryn
also features new SSE4 instructions catered towards multimedia tasks. SSE4 introduces 47 new instructions to improve performance of video accelerators, graphics building blocks and streaming load. Intel claims a 2x performance gain in video acceleration tasks. There are 14 new instructions for video accelerator performance enhancement. Intel improves compiler auto-vectorization performance with 32 new instructions.

Intel expects SSE4 optimizations to deliver performance improvements in video authoring, imaging, graphics, video search, off-chip accelerators, gaming and physics applications. Early benchmarks with an SSE4 optimized version of DivX 6.6 Alpha yielded a 116 percent performance improvement due to SSE4 optimizations.

Also new to Penryn is the Super Shuffle Engine. Intel’s Super Shuffle Engine allows for shuffling unpacking, packing, align concatenated sources, wide shifts, insertion and extraction, and setup for horizontal arithmetic functions. Intel claims a “2x faster SSE shuffle instruction execution,” according to earlier briefing documents.

Although Yorkfield uses a 45nm fab process and consumes less power, Intel plans to stick to its existing 95 Watt and 130 Watt thermal design power ratings.

DailyTech previously presented quick and dirty benchmarks of AMD’s 1.6 GHz Barcelona processor last June. Today, DailyTech has a few quick and dirty benchmarks of Intel’s quad-core Yorkfield Core 2 processor, in an LGA775 package.

The testing configuration is as follows:
  • Intel Core 2 Extreme QX6700 @ 2.33 GHz, 1333 MHz front-side bus
  • Intel Yorkfield 2.33 GHz, 1333 MHz front-side bus
  • Intel P35 Express based motherboard
  • 2x1GB DDR3-1333 memory
  • AMD ATI Radeon HD 2600 XT
Since Intel does not have a 2.33 GHz Kentsfield processor, a Core 2 Extreme QX6700 is used. The Core 2 Extreme QX6700 has an unlocked multiplier, which allowed us to clock it at 2.33 GHz with a 1333 MHz front-side bus.

 SiSoft Sandra XII CPU-Arithmetic
MIPS

Kentsfield
2.33 GHz
Yorkfield
2.33 GHz
ALU
43003
43299
FPU
29981
34693

 SiSoft Sandra XII CPU Multimedia
MIPS

Kentsfield
2.33 GHz
Yorkfield
2.33 GHz
ALU
257295
256216
FPU
140055
140301

 SiSoft Sandra XII Memory Bandwidth
MB/s

Kentsfield
2.33 GHz
Yorkfield
2.33 GHz
ALU
6639
7124
FPU
6639
7121

Synthetic benchmarks do not really reveal too much of a performance difference between Kentsfield and Yorkfield. However, SiSoft Sandra XII does not contain SSE4 optimizations yet.

Unlike AMD, Intel relies on an off-chip memory controller. Although AMD achieves low latencies with its integrated memory controller, Intel manages the same feat with a northbridge-installed controller. Intel managed to offset the latencies associated with off-die memory controllers with increased L2 cache. Yorkfield’s additional L2 cache and speedier 24-way associative L2 cache yields an approximate memory bandwidth boost of 7 percent.

 Cinebench 10 Performance
CB-CPU

Kentsfield
2.33 GHz
Yorkfield
2.33 GHz
Single
2400
2582
Multithread
8518
9206

 DivX 6.6
Time

Kentsfield
2.33 GHz
Yorkfield
2.33 GHz
Seconds
12.90
11.80

Cinebench 10 yields an approximate 8 percent boost in single and multithreaded rendering. Encoding a video file into DivX also yields a similar 8 percent performance boost.

Overall, with our limited time with Yorkfield, performance of the quad-core processor is roughly 8 percent faster clock for clock than Kentsfield. However, this is expected as Yorkfield is essentially a 45nm die shrink of Kentsfield with a few tweaks here and there.

Expect Intel to begin shipping Yorkfield in mass quantities in Q1 2008. Quad-core Xeon X5400 Harpertown processors, which are somewhat similar to Yorkfield, will ship in November.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Not bad
By pattycake0147 on 8/23/2007 10:59:01 PM , Rating: 2
It's nice to see the increase in performance, but what kind of price tag are these going to have compared to Kentsfield.




RE: Not bad
By Anh Huynh on 8/23/2007 11:07:45 PM , Rating: 2
My guess is, if the Xeon Harpertown prices is a hint, probably $50 more, clock for clock.


RE: Not bad
By xsilver on 8/24/2007 11:15:52 AM , Rating: 2
what does that mean exactly? what will be the predicted cheapest yorkfield? anything in the range of the q6600 now?


RE: Not bad
By East17 on 8/24/07, Rating: 0
RE: Not bad
By ZmaxDP on 8/24/2007 6:53:03 PM , Rating: 3
I understand your frustration, but if DT or any review site used your reasoning they'd still be using version 1 of every single benchmarking suite they used, and we'd be doing gaming performance comparisons with Duke Nukem 3D or the original Doom. The point of this article was a comparison of Kentsfield to Yorktown not to compare either to anything else. That's what you wanted the review to do. Look on Anandtech and I bet you can find a little more detail so you could make a better comparison. Or, if you've got a kentsfield, download the cinebench benchmark and run both versions on your processor. There's your comparison...


RE: Not bad
By ZmaxDP on 8/24/2007 6:53:32 PM , Rating: 2
Oops, Yorkfield...


RE: Not bad
By Alexstarfire on 8/24/2007 8:36:16 PM , Rating: 2
Well, they could always benchmark the CPU on both versions. I understand that we want to compare differences clock for clock, but it's not like you should cut out older CPUs from the benchmark. All they'd have to do is run the new CPUs on the old benchmark and that'd be it. Any new CPUs that come out after that can just be benched on the new benchmarks.


RE: Not bad
By weskurtz0081 on 8/23/2007 11:13:02 PM , Rating: 5
Any performance increase is better than nothing, I was just expecting more. I have been hearing so many fanboys saying that no way Barcy can compete but, if it is only 5-10% better than Conroe.... it seems very possible that Barcy could compete or beat it. Obviously, that will only be the case if Barcy performs like AMD says it will, so no one really knows for sure about that.


RE: Not bad
By MrDiSante on 8/23/2007 11:49:00 PM , Rating: 4
What you have to remember, and what Intel's said all along is that Penryn is a die-shrink of Conroe with a few new tricks. It is not a new architecture, so there's no reason why Barcelona wouldn't be able to compete, assuming AMD's claims reflect the truth. Personally, I'm happy that Penryn isn't as much of a jump, much as advances are nice if AMD weren't able to compete we would go back to the PIII days and then likely back to the PIV days when Intel could release crap and not care about competition.


RE: Not bad
By retrospooty on 8/24/2007 12:11:07 AM , Rating: 2
Since the cache has improvements, and is also larger, we may well see it scale better. In other words its an avg of 5% faster at a meager 2.33 ghz, but what about 3.33, or higher which is on tap for later in 2008. I am sure could be achieved now by raising voltage. It may be quite a bit faster.


RE: Not bad
By mars777 on 8/24/2007 5:02:15 AM , Rating: 4
4 facts:

1. Cache is there to compensate memory access latency
2. More cache is needed to compensate inter-cpu communication
3. Higher clocked is the CPU, it makes more inter-cpu and memory requests.
4. Clocking higher the CPU retains the original quantity of cache

Well now try to rethink and post your conclusions :)

The bottleneck is in the cpu-to-cpu and cpu-to-memory bus...
Overclocking needs higher FSB, and that is exactly what you do when you overclock, raise FSB and raise memory clock. On intel systems this is true and this means clocking higher the CPU too. That is because of the front side bus deriving from the northbridge, it dictates the memory AND cpu-to-cpu bus speed. So when an (current) Intel CPU goes higher in clocks it needs more data through its FSB, but the cache remains the same - and the front side bus too (since faster chips have only higher multiplier). When you overclock the cpu you overclock the cache too, but this means nothing since its merely a requirement. If it didn't overclock with the processor you would have SEVERE performance drops.
Conclusion: current amount of cache is what Intel calculated as the right amount for Penryn bandwidth needs and improvement for applications (encoding is one of those that will benefit much). The amount of cache is generally computed based on architecture, and scaling is mainly dictated by architecture too. I've read somewhere that C2D doesn't need more than 3MB of cache, as more wouldn't help anything but applications that rely mainly on cache (ie encoding).

On AMD systems there is another bus in the way. The Hypertransport bus is for CPU-CPU and CPU-Peripherals communications. The FSB there is something fictional. Memory clock is postulated by division of the CPU clock - because of the integrated memory controller. And this is the reason why AMD cpus scale better with speed (with higher multipliers) - because you clock higher the memory controller too (giving the same memory speed you still have lower latencies)!


RE: Not bad
By Targon on 8/24/2007 5:15:39 AM , Rating: 3
But if you plan to put these things into a server, you do NOT overclock, because stability is more important. It is also possible that at this point, Intel is having problems ramping the speed of these things up in the way that pre-B2 stepping Barcelona processors could not be clocked up to 3GHz.

Anything new, from a new design to a new process technology will take some time to get right, which is why the new chips are yet to be released. The interesting thing is that so many people were saying how Peryn would eliminate any chance of AMD catching up or regaining the performance lead.


RE: Not bad
By mrdelldude on 8/24/2007 10:25:26 PM , Rating: 2
quote:
But if you plan to put these things into a server, you do NOT overclock, because stability is more important.


True

quote:
It is also possible that at this point, Intel is having problems ramping the speed of these things up in the way that pre-B2 stepping Barcelona processors could not be clocked up to 3GHz.


However in the preview of the dual-core chip - Kentsfield, they were able to take a 2.33GHz chip to 3.22GHz with little problem. They used the stock voltage in doing so, just upped the bus speed.

Granted a quad-core puts out more heat than a dual-core, so likely this sample wouldn't scale as easily.

So I think they would be able to ramp up the speed rather quickly if they feel they need to.


RE: Not bad
By retrospooty on 8/24/2007 9:35:04 AM , Rating: 3
"1. Cache is there to compensate memory access latency
2. More cache is needed to compensate inter-cpu communication
3. Higher clocked is the CPU, it makes more inter-cpu and memory requests.
4. Clocking higher the CPU retains the original quantity of cache"


What you aren't thinking about is that the cache runs at full speed, meaning a 2.33ghz Penryn's cache is runing at 2.33ghz, and a 3.33ghz is running at 3.33. All I am saying is that I believe the higher speed bins will break away a bit more than 5%. Not likely going to be a huge difference, but it will be faster. Previous reports here had a 3.33ghz penryn running an average of 20% faster than a 3ghz Conroe. 10% of that was of course due to the direct speed increase.

Future benchies will tell for sure though.


RE: Not bad
By EarthsDM on 8/24/2007 12:11:54 AM , Rating: 2
We may reenter a time where Intel and AMD are better in different applications. Yorkfield doesn't seem much stronger than Conroe or Kentsfield in floating-point calculations, which is supposedly Barcelona's strong point. Also, AMD's chips seem to get a larger performance boost from running a 64-bit OS, as shown by Anand in his comparison of AMD64 and EMT64.

http://www.xbitlabs.com/articles/cpu/display/core2...

If this is true (and seeing as Barcelona has enhancements for 128-bit FP) AMD may beat out Intel in certain apps. We could see a time where AMD is better for scientific calculations.


RE: Not bad
By JumpingJack on 8/24/2007 1:43:54 AM , Rating: 5
Each of xbit's graphs has the raw number, go through bench by bench and calculate the % improvement, then average... you will find that they miss calculated AMD's 64-bit speed up...

Finally, if you throw out the skewed data -- Science mark where it has been optimized for K8 and the Sandra SSE results due to the wide SSE on Conroe, the actual performance average is even steven. Heck, here I will do it for you:

http://img513.imageshack.us/img513/9022/core264bit...

It takes about 10 minutes to type these numbers up in excel, run the calculation, and show that 64-bit is not an advantage for AMD.


RE: Not bad
By EarthsDM on 8/24/2007 10:03:27 AM , Rating: 2
Wow. Ok, you're right. I don't know what else to say except that I should do more of my own math...


RE: Not bad
By JumpingJack on 8/24/2007 8:40:15 PM , Rating: 2
Well, after rereading my post to you I should apologize, it came across at a bit curt ....

I would have not noticed it except it had generated some debate when Xbit posted the article, when I was looking at the data I had originally said 6%, not bad... but when eyeballing their chart it did not look 6% delta to me...

Truth is, if you dig around in some reviews you find these types of mistakes often, not routine but often.. some are honest mistakes (such as this one), others are subtly hidden for what ever reason.

In this particularly case, I see this linked now and again providing a somewhat misleading conclusion...


RE: Not bad
By nineball9 on 8/25/2007 11:17:01 AM , Rating: 2
Good analysis and spreadsheet design. Labeling the last 2 columns "64 bit speedup" was a bit misleading (at least to me). I interpreted these columns to mean the percentage improvement of 64-bit operation over 32-bit operation when the numbers are actually just (64-bit value / 32-bit value) * 100. At first, seeing "64 bit speedup" values of around 100% led me to believe 64-bit operation was around twice as fast which didn't make much sense.
Nice work though!


RE: Not bad
By JumpingJack on 8/24/2007 1:24:20 AM , Rating: 3
Wes -- it is not clock for clock that matters exclusively, what matters is both clock for clock and clock itself.

Yorksfield will scale up to higher frequency, and it is a huge question mark if AMD can make up the IPC, exceed the IPC and get a clock that will offer a performance exceeding part.... if not, then Intel sets the price and AMD is no better off than they were with K8.

It is a big question mark, which will have it's answer within the next few months... the suspense is amazing.


RE: Not bad
By erikejw on 8/24/2007 8:22:58 AM , Rating: 1
For those who thought this die update would crush Conroe they probably do not know anything about processors.

5-10% is quite good and this is with an engineering sample and a motherboard that does not have the bios tuned for the new dies. It will probably be a few more percent but not much more than that.


RE: Not bad
By encryptkeeper on 8/24/2007 10:45:44 AM , Rating: 2
Intels performance gain from 1066 to 1333 FSB was more of a marketing ploy to refresh the line. Bench tests put 1333 only at one to two percent above 1066. Looks like this is basically the same thing.

It'll take years, but if AMD survives, I have a feeling Intel will regret letting 10,000 employees go last year. I'm sure several of those were people who poured many long hours into developing the Core 2 architecture. Once they knew they had a hit on their hands, all those people were essentially dead weight.


RE: Not bad
By Master Kenobi (blog) on 8/24/2007 11:24:11 AM , Rating: 3
quote:
It'll take years, but if AMD survives, I have a feeling Intel will regret letting 10,000 employees go last year. I'm sure several of those were people who poured many long hours into developing the Core 2 architecture. Once they knew they had a hit on their hands, all those people were essentially dead weight.

Intel didn't cut anyone from the Israeli lab which was the masterminds behind the Core 2 architecture. People cut were from old 90nm fabs, marketing, middle management, and old timers from the 200mm wafer factory.

No loss there.


RE: Not bad
By encryptkeeper on 8/24/2007 12:31:50 PM , Rating: 3
I stand...err, sit corrected.


RE: Not bad
By bwmccann on 8/24/2007 5:19:39 PM , Rating: 3
quote:
Intel didn't cut anyone from the Israeli lab which was the masterminds behind the Core 2 architecture. People cut were from old 90nm fabs, marketing, middle management, and old timers from the 200mm wafer factory.


I really wish that 100% true but good people also were cut not just old timers and 200mm factory workers.


RE: Not bad
By deeznuts on 8/24/2007 12:59:02 PM , Rating: 2
quote:
Any performance increase is better than nothing, I was just expecting more. I have been hearing so many fanboys saying that no way Barcy can compete but, if it is only 5-10% better than Conroe.... it seems very possible that Barcy could compete or beat it.


westkurtz0081, I think most people expected this slight improvement in clock-for-clock speed, since it was just a die shrink plus some tweaks, (notwithstanding proggies that are optimized for SSE4). So 0-10% depending on apps might have been expected.

It's the increased clockspeed which will compensate for Barcelona. The jury is still out until Barcelona hits the wild or AMD showcases its performance for people to test. Remember, these things are launching at 3.33GHz which is higher than Barcelona's launch clock speed.

Interesting battle, and I'm predicting victories depending on applications for both sides.


RE: Not bad
By Staples on 8/24/2007 12:15:15 AM , Rating: 2
This echos my exact thoughts. It won't be until next year when we see affordable Penryns and if they come out at these low clock speeds and cost more to boot, then I would feel very safe upgrading today. I wish we had some info on when a desktop 3GHz or higher was coming out and what their prices will be.


RE: Not bad
By Treckin on 8/24/2007 1:18:39 AM , Rating: 3
NOT A DESKTOP PROCESSOR... NOT A DESKTOP PROCESSOR... NOT A DESKTOP PROCESSOR...

I think that Barcelona will destroy it for all practical purposes, even supposing its slower... The individual core power planes and on board memory controller allow for both more system ram per channel as well as a lower voltage NB. IN the realm of business computing, breakneck speed is far less important than power savings. The individual core throttling on Barcelona should be the kicker in the deal, as well as finally fully coming off of AMD's aged 90nm process.
Seeing as most business servers are idle almost 90% of the time, raw speed is second to operation costs. One would only be interested in raw speed as far as a number crunching rig goes, and in that case, you're better off with cell or IBM tech anyhow...

I think AMD will dominate this round of server apps, especially seeing as current AM2 boards will socket these, with the option to upgrade to the AM2+ later (or 1207+, whichever you prefer).

If I get one response to this referring to Agena or Nehelam Im going to start BSODing peoples towers...


RE: Not bad
By Chris Ram on 8/24/2007 1:54:43 AM , Rating: 3
It might not be a desktop proc but what is the difference between this version and something that will fit in your P35?

There is a lot more to this chip than what people are talking about now.

The 45nm Hi-K process will help quite a bit. It increases the transistor density almost 2x and reduces the power for transistor switching by 30% compared to 65 nm. At the same time it increases the switching speed by 20%, reduces the source to drain leakage by a factor of 5 and reduces the gate oxide leakage by 10x.

Last but not least is "Deep Power down Technology". The chip is able to shut down the core and/or L2.


RE: Not bad
By Chris Ram on 8/24/2007 2:31:20 AM , Rating: 3
Doh, no edit button, I will have to remember that before I post at 2 AM.


RE: Not bad
By Viditor on 8/24/2007 11:09:11 AM , Rating: 2
quote:
The 45nm Hi-K process will help quite a bit.

While the specs sound good on paper, the power specs Anand is getting show that there is only a 3% savings at idle and 10% under load at the same clock vs Conroe.
http://www.anandtech.com/cpuchipsets/intel/showdoc...


RE: Not bad
By Brunnis on 8/24/2007 11:44:23 AM , Rating: 2
Those are power consumption measurements for the whole system, which make them highly misleading when looking at CPU power savings. Judging my Anand's numbers and previous measurements on Conroe CPUs, the power savings under load seem to be 25-30%.


RE: Not bad
By DallasTexas on 8/24/2007 8:59:49 AM , Rating: 3
You're just propagating the most recent AMD propaganda for obvious reasons - "performance does not matter".

Guess what, it does matter. Even using your arguments about energy, more performance translates into more virtualized machines per system which equals less hardware which equals less power.

The argument of hiding the need for more performance is a hopeless AMD marketing last resort. Performance matters, performance matters, performance matters. Even for a desktop application for you and me, I'll take performance over saving 7 cents over one years time. Sorry, that dog don't hunt.


RE: Not bad
By chsh1ca on 8/24/2007 11:14:34 PM , Rating: 2
Since when does AMD have marketing?

Performance can be measured in a lot of ways -- for instance a budget processor might be gauged on performance per dollar, a desktop processor might be performance per time(second), and in large clusters performance per watt may in fact be a selling point. It certainly is a factor in notebooks. Just because it's not raw maximum output power doesn't mean it doesn't perform well in a different way.


RE: Not bad
By TomZ on 8/25/2007 12:01:54 AM , Rating: 2
This is AMD's version of marketing:

http://breakfree.amd.com/en-us/default.aspx

I.e., buy from us, we're the good guys. Performance doesn't matter, just buy from us because Intel is naughty.

Sorry, but I really don't perceive that AMD is taking the high road. I think if they allocated resources into real marketing to build their brand, instead of into stupid lawsuits, then they might get somewhere.


RE: Not bad
By deeznuts on 8/24/2007 12:51:20 PM , Rating: 2
quote:
NOT A DESKTOP PROCESSOR... NOT A DESKTOP PROCESSOR... NOT A DESKTOP PROCESSOR...
What are you referring to when saying "NOT A DESKTOP PROCESSOR?" Are you referring to the Yorkfield in this article? Or to Barcelona? Barcelona is not a desktop processor, but this Yorkfield is. The server/workstation processor is Harpertown (XEON).

So, just wanted to know what proc you are talking about. Since you're talking about a server proc (Barcelona) in a desktop proc thread.


"Death Is Very Likely The Single Best Invention Of Life" -- Steve Jobs

Related Articles
More "Penryn" Benchmarks Revealed
August 22, 2007, 2:25 PM
Intel Sets "Penryn" Launch Date
August 14, 2007, 6:18 PM
"Penryn" Benchmarks Hit The Web
August 7, 2007, 4:04 PM
Quick and Dirty AMD K10 Cinebench
June 6, 2007, 5:12 AM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki