Print 112 comment(s) - last by MrPoletski.. on Jan 27 at 11:45 AM

Sandia simulations reveal memory is the bottleneck for some multi-core processors

Years ago, the hallmark of processor performance was clock speed. As chipmakers hit the wall on how far they could push clock speeds processor designs started to go to multiple cores to increase performance. However, as many users can tell you performance doesn't always increase the more cores you add to a system.

Benchmarkers know that a quad core processor often offers less performance than a similarly clocked dual-core processor for some uses. The reason for this phenomenon according to Sandia is one of memory availability. Supercomputers have tried to increase performance by moving to multiple core processors, just as the world of consumer processors has done.

The Sandia team has found that simply increasing the number of cores in a processor doesn't always improve performance, and at a point the performance actually decreases. Sandia simulations have shown that moving from dual core to four core processors offers a significant increase in performance. However, the team has found that moving from four cores to eight cores offers an insignificant performance gain. When you move from eight cores to 16 cores, the performance actually drops.

Sandia team members used simulations with algorithms for deriving knowledge form large data sets for their tests. The team found that when you moved to 16 cores the performance of the system was barely as good as the performance seen with dual-cores.

The problem according to the team is the lack of memory bandwidth along with fighting between the cores over the available memory bus of each processor. The team uses a supermarket analogy to better explain the problem. If two clerks check out your purchases, the process goes faster, add four clerks and things are even quicker.

However, if you add eight clerks or 16 clerks it becomes a problem to not only get your items to each clerk, but the clerks can get in each other's way leading to slower performance than using less clerks provides. Team member Arun Rodrigues said in a statement, "To some extent, it is pointing out the obvious — many of our applications have been memory-bandwidth-limited even on a single core. However, it is not an issue to which industry has a known solution, and the problem is often ignored."

James Peery, director of Sandia's Computations, Computers, Information, and Mathematics Center said, "The difficulty is contention among modules. The cores are all asking for memory through the same pipe. It's like having one, two, four, or eight people all talking to you at the same time, saying, 'I want this information.' Then they have to wait until the answer to their request comes back. This causes delays."

The researchers say that today there are memory systems available that offer dramatically improved memory performance over what was available a year ago, but the underlying fundamental memory problem remains.

Sandia and the ORNL are working together on a project that is intended to pave the way for exaflop supercomputing. The ORNL currently has the fastest supercomputer in the world, called the Jaguar, which was the first supercomputer to break the sustained petaflop barrier.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

By Uncle on 1/17/2009 2:39:44 PM , Rating: 2
I've had this conclusion for quite some time. The tech industry has to keep moving forward, and they can only do so with consumers buying products.Keeps us working.Without going into too much detail, if your a gamer the video card companies talk about off loading to the gpu from the cpu. I and my Son still use the 939 platform opteron 165x2 at 3ghz and Son 148x1 at 3 ghz and all we have to do so far is buy a video card every so often. You think these companies get together to discuss whats honestly good for us or is it to sell more products. Look to the memory as an example , single, duel, now triple. The cpu single,duel, triple and quad. That's why netbooks are a hit other then screen size. People are realizing they don't need a super computer to do most of what they want. We have just come full circle, our original laptops were small, but their was money to be made just by increasing the size, and in our society we have been taught from childhood up that bigger is better. As long as people keep listening and watching commercials and Ads nothing is going to change. The bottom line is that we all have to make a living so SPEND SPEND SPEND til you drop. Isn't that what Bush said to the American people after 9/11 don't stop spending, get out and shop till you drop we have to keep the economy going. enough said.

RE: 2x4
By Comdrpopnfresh on 1/17/2009 8:20:24 PM , Rating: 2
You and your son cannot play games like crysis, fallout 3, farcry 2, and mirrors edge. I'm thinking the only way you can manage being serious about just upgrading graphics cards on those cpus is because you have a monitor with a resolution less than 1280x1024

RE: 2x4
By Noya on 1/18/2009 5:01:12 AM , Rating: 2
Totally, I had an Opteron 165 @ 2.8ghz, 2gb ddr400 with an evga 9600gt ssc and at 1024x768 it would barely play the Crysis demo at medium settings.

I upgraded when MS was offering 30% cashback on eBay to a Q8200 @ 3.3ghz, UD3P, 8gb ddr2@950mhz, a 9800gtx and a 22" 1680x1050 monitor...and now I can play Crysis 64-bit at the second highest settings (no AA) at a reasonable 25-40fps. FarCry2 is lame (free with vid card) but I watched a game trailer of Mirrors Edge a few months back, how is it?

RE: 2x4
By Uncle on 1/18/2009 1:14:21 PM , Rating: 2
You changed your whole system out, you should have bought the 9800gtx and tried it before some one advised you to buy a new system. I'm sure Crysis would have played fine. I'm getting the idea that you now have to justify your purchase.

RE: 2x4
By RubberJohnny on 1/18/2009 8:40:19 PM , Rating: 2
Maybe YOU should have tried a new CPU before telling everyone that upgrading is a waste of money???

I recently went from an OC'd 939 4200x2 to a Q6600 (stock ATM, stuck with the same 4850 vid card) and the performance increase was MASSIVE. My FPS increased dramatically and the whole experience is just so much smoother - RTS were the biggest improvement! Not sure how this would compare to a really fast core 2 duo but i needed the extra cores for virtualisation.

My recent upgrade cycle has gone 4200x2+7900gs --> 4200x2+4850 --> q6600+4850 and i can tell you the biggest jump in performance was the CPU not the vid card and I game at 1920x1200.

So i'm calling SHENADIGANS on uncle...time to upgrade son!

RE: 2x4
By Uncle on 1/18/2009 9:42:37 PM , Rating: 2
Big difference between an 4200x2 vs opteron 165.

RE: 2x4
By RubberJohnny on 1/18/2009 10:43:15 PM , Rating: 2
Yeah and there's an even bigger difference between an opty 165 and a Q6600 which is exactly my point...

RE: 2x4
By SlyNine on 1/18/2009 11:55:18 PM , Rating: 2
Mean while the 4850 is around 8x faster then your 7900GS.

In games since they only really use 2 cores right now your Q6600 is about equal to a opty at 3ghz. My friend I have run many benchmarks with the A64 x2 and Q6600.

You are dead wrong.

RE: 2x4
By Denithor on 1/19/09, Rating: 0
RE: 2x4
By SlyNine on 1/19/2009 11:43:29 PM , Rating: 2
First off core per core the E8500 is better then the Q6600, Second off I benchmarked both the Q6600 and 4200X2 at stock and overclocked speeds . At 2.4 ghz the Q6600 does not score hardly any more FPS then a 4200X2 at 3Ghz in games that do not support good SMT.

I didn't say any thing about gaming on a single core, But I can tell you assassins creed isn't even playable on a single core.

SupCom may get higher FPS and have a smoother interface w/quad, but suffers from the same simulation slow down at 2.4 ghz as a 4200X2 @ 3.0ghz. Supreme Commander does not support Quad core's with a crap unfortunately because right now it's my favorite game.

UT3 is good and I have yet to try GTA4.

With a Q6600 stock I've seen maybe a 2% increase in FPS from my old A64 at 3.0ghz. an e8500 is probably a bit better. But still not night at day in games, This isn't sisoft sandra were talking about.

Moving forward yes, the Q6600 is a better future investment then any dual core.

RE: 2x4
By mindless1 on 1/21/2009 4:57:42 PM , Rating: 2
Why would anyone use a turd like GTA4 as an example?

Bottom line - You don't need more than 3GHz from a dual core processor to play most modern games fine, providing the video card is also up to snuff.

A single core at 3GHz is also viable for over 50% of the games out there today. It may not win the benchmark graphing contest, but it will maintain a playable framerate at enough games. Will it play the extra demanding ones well? Consider the question before answering, benchmarks and reviewers deliberately try to find something to show contrast, instead of the typical games that will run fine on most of the hardware tested!

Truth is, yes someone with a single core Athlon 64 who upgraded their video a couple times has had a great value run at gaming. They may not be able to extend this much into the future, but as always we can't ever think any gaming combo will last far into the future until the future is here.

Any game that won't run properly on a $60 CPU is defective. That's what the eyecandy adjustment settings are for, but in the end those make not much difference in gameplay enjoyment compared to the eyecandy settings related to GPU performance and resolutions possible at playable framerate.

It always was and still is more about the video card than anything else. Moreso than ever today with monitors continuing to rise in resolution.

RE: 2x4
By SlyNine on 1/18/2009 11:52:39 PM , Rating: 2
You ever check out your CPU load on these games. It almost never hits 100% on any core if you are not running a 8800GT or better.

I had a 4200X2 939 for some time and also upgraded to the Q6600. Ran the 4200x2 with a 1900XT and then upgraded to 8800GT, Then upgraded to a Q6600 and got my other 8800GT card after a time.

The biggest performance increase by FAR was going to the 8800GT on the 4200X2, the second biggest was adding another 8800GT. The Q6600 upgrade was nice, but not nearly the jump.

The 1900XT>7900GS , the 4850>8800GT. You had an even bigger jump in video and yet say the CPU was your biggest jump. You sir are full of poop. Or you are running all your games at 1024x768 with no AA.

RE: 2x4
By RubberJohnny on 1/19/2009 7:11:04 AM , Rating: 2
What res were you running? CPU load on the games i mostly play - TF2, BF2 and Company of heroes were maxing out some of the time at 1920x1200 with the X2 4200+ (BF2 1600x1200). When i turned off AA things did not really get any smoother so it must have been the cpu holding the 4850 back. Things didn't get acceptably smooth till i lowered the res back to 1280x1024.

However when i got the Q6600 i could run all of those games on full quality at 1920x1200 and they were sliky smooth (ok maybe COH isn't silky on full ;) Perhaps my X2 4200+ system had some other hardware issue (was a clean xp build) but i always felt that 4850 didn't perform like i expected till i paired it with the Q6600.

RE: 2x4
By SlyNine on 1/19/2009 11:53:10 PM , Rating: 2
Well, Im going to eat alittle crow, I have not played or tested those games,

I play FC2, Crysis, Americas army, SupCom ( supreme commander), Gears of war(big improvment going to the Q6600), Grid, and Stalker.

But still in all my games the biggest improvment was going to the better video card, perhaps you were having some other problem that was fixed with your Q6600 setup.

You have a much better setup moving forward then you did with the 4200x2 so no need to worry.

RE: 2x4
By MrPoletski on 1/23/2009 4:43:39 AM , Rating: 1
i7@3.7Ghz here, wouldn't go back if you paid me.

There is something sektheh about 8 individual performance bars in task manager;) yes I know there are only 4 processors.

SERIOUSLY though, I think a slight change should be made in relatio to perating systems and CPU's. You operating system has a task handler, that decides what processor gets used by what apps and such.

Well I think that job would be better done by an arbiter on the CPU itself, that makes the chip appear as a single core and divides the workload amongst its loyal servant cores.

Would make coding for multicore systems a lot easier.

RE: 2x4
By MrPoletski on 1/27/2009 11:45:18 AM , Rating: 2
Nobody like the idea of hardware task management?

RE: 2x4
By Kary on 1/20/2009 3:48:55 PM , Rating: 2
..I'm running a Pentium-D 805 (2.66Ghz dual core) with 4 GB of RAM and a 8600GTS video card..I can't top out the Crysis demo but I get good frame rates above medium settings without topping out my CPU (been awhile, I think I was running 1280x1024 because that was native for my LCD at the time).

Most video games are NOT CPU limited (though,'s time for me to get a new computer soon... video conversion....Fooooorrrrrreeeeeeevvvvvvvveeeeeeeerr rrrr..and that GPU offload demo they have talked so about was even slower than using the CPU for me)

RE: 2x4
By LostInLine on 1/23/2009 12:15:23 PM , Rating: 2
Damn, and these people say gamers aren't social.

We may be argumentative but we are damn social.

RE: 2x4
By Hellburn on 1/18/2009 10:06:04 AM , Rating: 2
I've got a S939 x2 4400+ (2.2GHz per core) w/ 2GB RAM and a 4850 w/ 512MB. I've got my PC hooked up to my HDTV and 5.1 receiver and can play Far Cry 2 and UT3 pretty smoothly at 720p with High settings and V.High for shader effects. Yes, that is below the res you quoted, but for an HD entertainment center where you can fit more than a packet of crisps between you and the monitor, it's fine. Other games like RA3 I play at 1080p w/ 4xAA.

Yes, my system is somewhat unbalanced atm. And yes, games like Far Cry 2 and UT3 could probably look better (though not by much). I don't measure the quality of my life on the FPS yardstick but I also hate a sluggish response and I'll dial down settings to get a smooth enough experience. Personally I'll need/want to get a newer generation quad core at some point (prob 3-6mths), and at that stage I'll definitely get a nice boost in quality/fps.

However, my point/feeling is that it is possible to play most newer games at decent settings with some older generation hardware if you don't have too much running at the same time. Because of the eye-candy aspect, I feel the most critical component for a lot of newer games is the GFX card. So if you have decent one it goes a long way, and when you eventually get to upgrading the CPU/platform you'll get a nice extra FPS boost.

RE: 2x4
By Uncle on 1/18/2009 1:31:27 PM , Rating: 2
Actually I 'm using a 24" and my Son has a 20". I have no trouble playing those games with a 4870. Remember what the video industry is saying, the purpose of vid cards is to off load the work of the cpu. Play your games with an older cpu such as mine and monitor your cpu usage, you'd be amazed at how little the cpu works with a good video card. Take your time and do your own experimenting with your system before you go out and buy another system, unless money is no object, which it isn't for me but if I don't have to spend it uselessly which ultimately keeps more money in my pocket for other things, that's the route I take.

RE: 2x4
By RubberJohnny on 1/18/2009 8:42:31 PM , Rating: 2
Lies all need a good CPU to take advantage of a good GPU, end of story.

RE: 2x4
By Omega215D on 1/19/2009 12:08:08 AM , Rating: 2
On my Opteron 170 (Dual core) can play all those games listed at 1680x1050 with no problem, the graphics cards still play a major part in those games. The only game I ran into a problem with was Grand Theft Auto IV in which I had to overclock my processor to 2.7GHz to get it to run smoothly.

I then jumped on board to the Phenom 9600 and GTA IV runs much smoother without having to overclock. I'm sure this will be the case with future games like Alan Wake, Deus EX 3 etc.

Dual cores for a majority of today's games will be good enough considering I was still on a socket 939 platform where the Intel Core 2 Duo shows a major performance increase in all games.

RE: 2x4
By BeastieBoy on 1/20/2009 10:02:59 AM , Rating: 2
This argument makes it clear why consoles have become so popular. Let the game developers worry about making the games playable on my system, rather than trying to make my system play the games.

RE: 2x4
By jrb531 on 1/20/2009 11:25:05 AM , Rating: 2
GTA4 is a poorly written "port" that uses as excess of CPU cycles because it is not properly programed!

The Xbox 360 (which is the version that was ported) has an ATI/AMD 1950XT in it and as such the port is not designed to take advantage of the faster video cards and as such is forced to use your CPU to compensate.

The PC version also runs at a much higher resolution than the 360. This is not an example to use.

Sure using "brute force" to try and do a quick and dirty console port may work but it's the worst example you can find for saying that the CPU is most important for games.

Depends on the application
By jrb531 on 1/17/09, Rating: 0
RE: Depends on the application
By Shida on 1/17/09, Rating: 0
RE: Depends on the application
By Targon on 1/17/2009 4:58:53 PM , Rating: 5
It depends on what you are looking at buying since it should be a factor. You can go with an AMD Phenom 2 940 for example, running at 3GHz per core, and no matter what, on the AMD side of the industry, you will be seeing the best performance from the new processor.

Comparing a dual core running at 3GHz vs. a quad running at 2.5GHz, then yea, the dual core would be better for legacy games and applications, but running at the same clock speed at this point, there isn't any real downside.

When it comes to looking forward, we will see both AMD and Intel moving to 3, 4, or higher for the number of memory channels the CPU can access and that will take care of the problem with accessing memory. We may see new types of memory that also address this by making it so the memory module will allow for 4 or more banks on a single module, giving much greater bandwidth per module.

I love how people use simulations to come up with this sort of conclusions, as if AMD and Intel don't look for ways to resolve this sort of problem in the first place.

RE: Depends on the application
By Motoman on 1/17/09, Rating: 0
RE: Depends on the application
By mindless1 on 1/17/2009 2:32:58 PM , Rating: 4
You are wrong about the sweet spot, it is fairly irrelevant that one core can handle the OS and other background apps, because if someone cares about performance much they will have what most do - the OS and background apps taking a mere 1% of processor time, not even close to as significant as memory, bus, and CPU speed for a single OR dual core CPU.

On the other hand, many games or other apps are multithreaded, even in ways we don't so directly attribute like positional sound processing on some sound cards. To that extent, you were correct it could be considered an OS process instead of directly attributed to the game, but in general the idea that background things usually accounting for a trivial amount of processing time would be an argument for multiple cores is incorrect.

RE: Depends on the application
By jrb531 on 1/17/09, Rating: 0
RE: Depends on the application
By SlyNine on 1/17/2009 4:00:49 PM , Rating: 2
Yes, I agree but the 3rd and 4th core can account for a MUCH BIGGER performance increase if utilized fully. Where as your slightly 20-40% clocked Dual core will only ever be 20-40% faster and the quad core can be 200% faster.

It's all about trade of and usage model

RE: Depends on the application
By mindless1 on 1/17/2009 11:46:29 PM , Rating: 2
Sure it "could" but usually won't come remotely close to 200% benefit not only due to suboptimal software design, but the other system bottlenecks like bus, memory, hard drive, video card, etc.

Then there's the question of whether most people, or even anyone, is really buying all this new software at hundreds to thousands of dollars which reviewer-benchmarkers seem to assume is the correct software to use when comparing processors, but then they fail to take this addt'l cost into consideration when factoring for relative value in their conclusions to reviews.

RE: Depends on the application
By SlyNine on 1/18/2009 1:49:28 AM , Rating: 2
Same can be said about higher clock speeds though.

I think when comparing CPU's its more a matter of weather or not they use the same software, then using the most likely software. Because you will never be able to compare the millions if not billions of possible software/hardware combinations. The only thing you can do is give a apples to apples comparison with benchmarking software that offers a common usage model.

Like Futuremark, it may not be perfect but it does correlate to real world a lot, you will never see a 8800GT beat a 4870 in real life or futuremark, between a 280GTX and the 4870 they are close and it shows.

The only way to tell exactly how something is going to work in your unique case is try it first hand or I guess get very lucky and find a reviewer that uses the exact same hardware and hope its set up the same as yours.

RE: Depends on the application
By SlyNine on 1/17/2009 4:02:33 PM , Rating: 2
also Frames per second is not always the most important thing, In RTS things like simulation speed can be much more important.

Who cares if I get 60FPS if it takes 10min to play threw 1 min of game time.

RE: Depends on the application
By SlyNine on 1/17/2009 3:56:48 PM , Rating: 3
The Quad core 2500 mhz should be plenty for any game out there, However the Dual Core @ 3ghz may not be enough in 3 years when a game needs it.

Depending on your upgrade cycle going with slightly less performance today to get better tomorrow could be a huge plus. RTS games I imagine will take a huge leap in supporting Quad core CPU's sooner or later.

But if you upgrade every year then you are probably better of getting a dual core and clocking it to 4 ghz. otho a Core I7 at 3ghz will probably out perform it.

RE: Depends on the application
By DanoruX on 1/17/2009 7:23:09 PM , Rating: 1
Been running my Q6600 @ 3.6Ghz for the past year and I wouldn't trade it for a dual-core any day, even if I clocked a Penryn to 4.5Ghz for one very simple reason - quad core lets me run more stuff at the same time! That is - having a 8 person skype conference open, bittorrent, encoding video and playing TF2 at the same time is no problem whatsoever.

That said, I'm glad most of my stuff isn't memory limited.

RE: Depends on the application
By Totally on 1/17/2009 9:02:55 PM , Rating: 3
Be realistic, game and run torrents meanwhile talking to 7 other people?

Bittorrent and TF2 how can you play with the lag and talk with. For me it's horrible when i try to play with the ping throught the roof, Left 4 Dead just screams at me.

RE: Depends on the application
By SlyNine on 1/18/2009 2:10:34 AM , Rating: 2
You'd be surprised what you do when you have the power to do it.

I haven't been able to get my Q6600 over 3ghz(owell) but When I game I leave tons of apps that im working on, set my encoding to run on core 1 and 2, and game away. In the future where games require the extra cores I will either have to buy something new, or stop doing things in the back ground. But I will also enjoy the boost from quad core at that time.

I've been on dual core and quad core. Quad really is that much better for me and if you don't want to upgrade your computer for a few years I recommend going quad and sacrifice some ghz if you are going to upgrade in one year and only play games, then go dual core.

Does this mean....?
By Shida on 1/17/2009 12:21:48 PM , Rating: 2
That I will have to just go with a dual core instead of a quad-core? I mean, really, if this is the case, then I could just buy a dual core for my build, save some money or use it for something else and get better bang for my buck.

But unless someone out there can tell me of some current real-world solution of how to get passed this problem, since, to my knowledge, I've not heard much except that performance in quads v.s. dual core CPU's generally are the same or unequal (with dual core giving more responsiveness than quad overtime as the article noted).

I pass this qustion to the DIY's out there

RE: Does this mean....?
By spread on 1/17/2009 12:44:07 PM , Rating: 2
Most people say that dual core are faster because they overclock better (less heat output) so they can run single and dual threaded apps faster than a quad.

RE: Does this mean....?
By Shida on 1/17/2009 12:47:54 PM , Rating: 2
Oh...well in my case I won't be over clocking. So...I guess I'm okay? (again I'm now confused).

RE: Does this mean....?
By jtesoro on 1/17/2009 1:06:28 PM , Rating: 2
Maybe people can offer better suggestions if they understood more what you use the computer for. What would that be? :)

RE: Does this mean....?
By Shida on 1/17/2009 1:23:03 PM , Rating: 2
Ah yes. Sorry. Uh I am currently building a dual-bood/mid-range gaming system: meaning I play games in Win and run multiple apps when on either Linux (the Ubuntu distro) or in Win. Now I know most would say not to do so for it would be a waste if you're going to be running linux on it but I do plan to do extensive gaming on it.

RE: Does this mean....?
By Shida on 1/17/2009 1:24:41 PM , Rating: 2
Sorry when I said "a waste" I meant as a colloquial for "a waste in spending on performance hardware" if you are going to be running linux to do basic stuff at a given time

RE: Does this mean....?
By nvalhalla on 1/17/2009 1:57:14 PM , Rating: 1
No, the article clearly states that 4 cores was much better than 2. Check out some of the game benchmarks of 2 vs 4 cores. In new games (farcry 2, warhead, ect. ) having more cores helps. Buy the fastest 4 core you can afford (I like the Q9400, it's coming down in price soon to $216)

RE: Does this mean....?
By mindless1 on 1/17/09, Rating: -1
RE: Does this mean....?
By SlyNine on 1/17/2009 3:52:28 PM , Rating: 2
5% is hardly significant when you are doubling the theoretical processing power.

I don't know how you can consider 5% significant. What are you smokeing, and can I get some.

RE: Does this mean....?
By mindless1 on 1/17/09, Rating: 0
RE: Does this mean....?
By SlyNine on 1/18/2009 1:55:51 AM , Rating: 2
Theirs lies, dam lies, and statistics.

5% is not significant on its own, only when paired with a bunch of other 5% increases does 5% become important. That's my OPINION, if 5% is significant to you then sure, but I have never heard it being generally considered significant.

But I can tell you that in many apps, Quad core over dual is 2x as fast and scales linear from 2 to 4. Transcodeing files and encoding H.264, I imagine games will sooner or later to.

RE: Does this mean....?
By garbageacc3 on 1/18/09, Rating: -1
RE: Does this mean....?
By masher2 (blog) on 1/18/2009 10:11:13 AM , Rating: 2
> "he already explained to you its statistical significance"

Unfortunately, he's wrong. The 5% level is considered the lowest level of statistical significance-- but that 5% is the level of "alpha", and has nothing to do with a percentage ratio between two different readings. Saying that a single value has increased by 5% and therefore is "statistically significant" is nonsensical.

RE: Does this mean....?
By mindless1 on 1/20/2009 12:28:16 AM , Rating: 2
Except when the other variables are fixed. We aren't trying to predict probability in this case, this is reproducible.

RE: Does this mean....?
By Hlafordlaes on 1/18/09, Rating: 0
RE: Does this mean....?
By SlyNine on 1/18/2009 11:37:29 PM , Rating: 2
Good for you. You get a star on your cheek.

RE: Does this mean....?
By Targon on 1/17/2009 7:12:30 PM , Rating: 2
If the comparison is between a dual core at 3GHz and a quad core at 2.5GHz, there will be cases where one or the other will be faster for your application. This changes as the dual core processors get relegated to the "budget" system and the quad core processors run at the same or faster clock speed.

On the AMD side at least, the Phenom 2 940 at 3GHz offers the clock speed people want with four cores. Yes, Intel may have an edge, but for $275, it's not a bad deal and isn't THAT much slower. The upgrade path is also fairly inexpensive as well if you have a motherboard that will support the new processor.

By neo64 on 1/17/2009 12:09:36 PM , Rating: 5
The ORNL currently has the fastest supercomputer in the world, called the Jaguar

the current fastest supercomputer is IBM's RoadRunner

RE: incorrect
By Motley on 1/17/2009 3:08:37 PM , Rating: 2
That would be correct, and the Jaguar as the SECOND to break the petaflop barrier.

Jaguar Chases Roadrunner, but Can’t Grab Top Spot on Latest List of World’s TOP500 Supercomputers Fri, 2008-11-14 05:56 The 32nd edition of the closely watched list of the world’s TOP500 supercomputers has just been issued, with the 1.105 petaflop/s IBM supercomputer at Los Alamos National Laboratory holding on to the top spot it first achieved in June 2008. The Los Alamos system, nicknamed Roadrunner, was slightly enhanced since June and narrowly fended off a challenge by the Cray XT5 supercomputer at Oak Ridge National Laboratory called Jaguar. The system, only the second to break the petaflop/s barrier, posted a top performance of 1.059 petaflop/s in running the Linpack benchmark application. One petaflop/s represents one quadrillion floating point operations per second.

RE: incorrect
By FaceMaster on 1/19/09, Rating: 0
That is why we need Quad Channel DDR4.
By iwod on 1/18/2009 9:39:23 AM , Rating: 2
1st. We need 64Bit OS so to support larger memeory ASAP.

2nd. We need Quad Channel Memory, if this is not pad limited.

3rd. We need DDR4, or XDR2....

By Jeffk464 on 1/18/2009 12:28:27 PM , Rating: 2
Vista 64bit, am I missing something?

By jmurbank on 1/21/2009 2:06:01 AM , Rating: 2
1st. We need 64Bit OS so to support larger memeory ASAP.

A 64-bit OS is already here such as GNU/Linux and Windows. Unfortunately, not all developers have made the move yet.

2nd. We need Quad Channel Memory, if this is not pad limited.

That helps a lot to improve bandwidth but interleaving can not be used cheaply. Interleaving is another way to improve performance.

3rd. We need DDR4, or XDR2...

Be careful recommending memory technologies. Not each memory technology performs better than previous technologies. For example DDR has lower latency compared to DDR2 and DDR2 has lower latency compared to DDR3 while DDR4 as being the worst for latency. XDR latency just as bad. The computer industry needs to count for latency too when dealing with bandwidth. I would rather go back to DDR for its latency specs and change its interface from parallel to serial to ease adding channels. If this happens both latency and bandwidth will be improved and at the same time ease of hook up for the engineer because of less wires.

By MrDiSante on 1/17/2009 12:25:59 PM , Rating: 2
We all know that putting wider and faster memory BUSes isn't an option, right?

RE: Because...
By mathew7 on 1/18/2009 3:24:55 PM , Rating: 2
No it's not. Currently, memory uses paralel busses, which have electrical crosstalk problems. I really think the memory should look at serialization, like PCIe replacing AGP and PCI. I wander if RAMBus has to do with delaying this.
The "researchers" forget how slow the 1 to 4 transition is happening. CPU manufacturers DO take memory bottleneck seriously. Memory DOES evolve with CPU, just not as fast.

Oops, should read more carefully.
By jonnyrocket on 1/17/2009 2:35:47 PM , Rating: 2
The study involved a large number of data sets, not one massive data set, so maybe it is applicable to Joe Plumber.

So my new quad core came with Vista 64. When are the software vendors going to start supporting it? Most don't bother to differentiate Vista 32 or 64 and you don't find out until it won't install unless you find out on the web.

By SlyNine on 1/19/2009 1:41:23 AM , Rating: 2
I thought it was a huge mistake to make Vista 32, Most OEM computers with a 32 bit chip is painfully slow to run on Vista and, people in the know that would buy a 32bit only CPU would use XP anyways. Vista 32 and Windows7 32bit is a waste of time.

By BikeDude on 1/17/2009 2:55:37 PM , Rating: 2
AMD's multiprocessor systems utilize NUMA, so each CPU socket gains a separate memory bus. (NUMA support exists in Windows 2003 and onwards)

My guess is that we will eventually see several memory buses hooked up to a single CPU socket, but maybe the number of pins required would make such a design impossible?

By SlyNine on 1/19/2009 1:46:07 AM , Rating: 2
My new socket 478,754,939 pin CPU is the shiznits.

By Comdrpopnfresh on 1/17/2009 3:10:29 PM , Rating: 2
Have a dedicated, internal, bandwidth 'router' for moving data between the cores. Then have an exterior bandwidth router to move data between the processor farm and the rest of the computer over a dedicated serial link.
It's be exactly like a commercial router for networking computers with one another, and the internet. It's already been figured out that it's really stupid to have networked computers communicate with one another through the same link as data from the outside is relayed to a computer; like connecting to the computer in the next room through the internet- the throughput is diminished, latencies increased...
You could dedicate a whole core, or a small gpu-like parallel-oriented processor to one or both of these routing mechanisms. As long as processors keep getting faster along with bandwidth links, the overhead of the process is negated, and allows you to sidestep the underlying poor mechanics of the current situation. The routers could be combined too- like the lan and wan sides and addressing of a computer network both being handled by 1 router.

By Chocobollz on 1/19/2009 1:50:16 AM , Rating: 2
Just like what AMD/ATi had done with their R700 architecture? Where they use a hub instead of ring-buses?

My opinion is, yes I agree with you. :-)

How did SGI do it?
By kyleb2112 on 1/17/2009 5:55:18 PM , Rating: 2
I seem to remember million dollar SGI rigs back in the 90s that ran 32 parallel processors. Did they know something we don't know now?

RE: How did SGI do it?
By masher2 (blog) on 1/18/2009 10:13:14 AM , Rating: 2
Different scenario. 32 separate processors equals 32X the cache bandwidth and (depending on the node setup) might even be 32X the main memory bandwidth as well.

This simultation applies to 32 cores on the same chip.

By UltraWide on 1/17/2009 4:01:28 PM , Rating: 3
All they are saying is that current CPU to memory performance is limited by bandwidth.

If they can figure out how to connect the CPU to system memory with more lanes then performance can increase linearly with the number of CPUs.

In real world use for most people quad core > dual core.

By saiga6360 on 1/17/2009 12:18:41 PM , Rating: 1
So a better memory system will help me watch my HD porn stutter free and add 5 fps to my console ported games. Gotcha.

RE: Nice
By Jeffk464 on 1/18/2009 12:27:31 PM , Rating: 1
Luckily my porn is already stutter free, and thats on a $600 lap top. Now if we can just get internet over the cell phone network fast enough to watch HD porn in real time, then we will really have something. We are waisting the time of all these engineers making faster cpu's when they should be making faster internet.

By nah on 1/17/2009 12:45:39 PM , Rating: 2
The team found that when you moved to 16 cores the performance of the system was barely as good as the performance seen with dual-cores.

Otherwise known as the law of diminishing marginal returns--or as an old adage would have it--too many cooks spoil the broth

I wonder
By icanhascpu on 1/17/2009 1:31:21 PM , Rating: 2
How much of this article is is based on realworld things and how processors really work when you cant even get something as simple as the fastest super computer correct?

Depends on what you use it for
By RMSistight on 1/17/2009 2:07:13 PM , Rating: 2
I have two powerful machines. One is a dual core E8500 and the other is a Q9450 quad core.

I use Ulead Videostudio for video rendering. I can tell you when I use these two machines to render the EXACT same project, I save about 1.5 hours on the quad core vs. the dual core.

Here is the thing though: the quad core machines doesn't do anything BUT video rendering. It's a dedicated machine for this purpose. My dual core is strictly for everyday usage, gaming, and not for video editing.

I personally hate doing everything on one machine (video editing, gaming, etc.) I like to separate it out. But like others have said, fine out what you're going to use your PC for then go from there.

By createcoms on 1/17/2009 2:07:52 PM , Rating: 2
I think this is perhaps why we are seeing triple-channel memory architectures and fattening L3 caches - the CPU manufacturers aren't dumb, they already knew all this and thus the development path is attacking the bottleneck with the aforementioned technologies.

What are they simulating?
By DXRick on 1/17/2009 2:19:45 PM , Rating: 2
Are they simulating a workstation situation where a single user is running one or more apps that are multi-threaded or a server that is scheduling and allocating multiple user requests across the available processors?

I would think that more than 4 processors would not work well in a workstation but would work in a server. Performance would also be limited by other factors, like hard drive accesses, common memory updates that require locks, and controller I/O.

By phxfreddy on 1/17/2009 10:05:43 PM , Rating: 2
... like the movie? Those guys didn't get anything done!

By PECourtejoie on 1/18/2009 4:59:51 AM , Rating: 2
It's been years that some Photoshop engineers have been claiming that memory bandwidth is the biggest bottleneck in current systems.

The multiplication of cores has magnified the issue, as more cores are fighting for memory.

I thought that AMD's approach was better with many cores being memory controllers, but memory sharing between cores also has a cost.

Chris Cox is for instance creates many benchmarks to test one's code, or one's compiler, so he is very qualified to identify bottlenecks.

This is funny
By blwest on 1/18/2009 11:50:01 AM , Rating: 2
Watching DT idiots post comments on something technical is like watching flys go to a fresh pile of dung.

By kickwormjoe on 1/18/2009 12:57:12 PM , Rating: 2
For those of you trying to decide whether to go 2 or 4-cores:

You've got be kidding me...
By Jeff7181 on 1/18/2009 1:47:59 PM , Rating: 2
A memory bus designed for one or two processing cores can't feed an infinite amount of processor cores? Wow... this is a HUGE breakthrough in R&D!!!

In other news, water is wet.

By Clauzii on 1/18/2009 3:32:47 PM , Rating: 2
But in the end it's how the programs that run on multicores are written.

All you need is a balanced system
By Dribble on 1/19/2009 5:31:35 AM , Rating: 2
We already have stuff that runs on lots of cores - that's what CUDA/equivelents are doing on your graphics card. It works fine for all sorts of useful things (e.g. folding home).

What about diversifying cores?
By wordsworm on 1/19/2009 8:42:42 AM , Rating: 2
As far as I know, the greatest motherboard traffic goes on between the graphics card and the CPU. One of the benefits I can think of is having multiple CPU cores + multiple GPU cores on a single chip. I know AMD has been working on its fusion project for awhile. Seems to me that data speeds on a chip are far faster than data speeds on the motherboard's bus. The more you take off the bus and put onto the chip, the better. Not only that, but it takes a lot less power to do so.

Though not quite out of AMD's womb, I suspect that it could provide a very interesting solution to memory bandwidth issues.

By jrb531 on 1/20/2009 11:42:51 AM , Rating: 2
I find it humorous (or sad) that people with differing "opinions" take this stuff so personal.

If you bought a 4x core CPU why are you "offended" if someone has the "opinion" that a 2x core CPU that runs faster "might" have been a better choice?

In some games a 4x core 2500mhz CPU runs slower (yet costs a ton more) than a 2x core running at 3000mhz.

This is FACT and not something made up. In other games (not many) the 4x core will be faster and I'm sure future games will take better advantage of extra cores.

Why does this "FACT" somehow upset people?

If you bought a 4x core (at a much greater expense) because you did your homework and some of the apps you run can take advantage of the extra cores then so be it...

but if you bought into the "hype" of more cores are "always" better without doing your homework then shame on you and getting into some kind of "my cores are greater than your cores" debate does nothing but make you look silly.

Fo me I'll stick with my lowly AMD 7500 for $75 and enjoy my Nvidia 260 until the prices of the 4x cores come down. Maybe by then more games (which is why I built this computer) will do something with the extra cores but right now I fail to see why I should pay an extra $200 for a Phenom II.

I respect that others may feel differently but as long as my games running at max resolution and setting obtain a "minimum" of 30fps (IE the framerate never drops below 30fps but is often much muhc higher) then I fail to see the difference in playing a game at a "minimum" of 30fps and one that gets 20000000000000fps

A bit arrogant?
By mindless1 on 1/21/2009 4:48:15 PM , Rating: 2
Instead of proclaiming this about continuing to add more cores to an otherwise same platform then declaring knowledge as a result, shouldn't they have instead admitted, "oops, we kinda forgot that to have a balanced system you might wanna add a bit more memory bandwidth too".

That's what it amounts to, it would've been wiser to build the better bus in the first place then if they wanted to scale it down, simply reduce the clock rate for that segment of testing.

By InternetBuzzard on 1/23/2009 8:40:45 PM , Rating: 2
Maybe they way forward with sort of information is creating proprietary first party motherboards. Even the Ps2 offered buses( 3.8gigabytes/sec fsb! in 00!) and caches that are scary large by today's standards on pcs.
Saying that we cant use a 16 core feasibly because of mentioned drawbacks is like saying we cant have flying cars because we don't have gas stations for them.
They will be a step forward when the rest of the technology steps forward in support of it. "If you build it they will come".

Well Duh
By jonnyrocket on 1/17/2009 2:27:57 PM , Rating: 1
Another case of tax $ to rediscover the obvious.

Anyone who has every designed a multiprocessor system has known there is a balance between memory bandwidth and number of CPUs contending for the memory.

The nastier stuff besides the bandwidth is cache-coherency and cache thrashing, lock contention, etc that don't scale easily.

And since the study involved very large data sets it probably doesn't really say much about Joe Plumber and his PC.

This is not all that surprising...
By Motoman on 1/17/09, Rating: -1
RE: This is not all that surprising...
By PrinceGaz on 1/17/2009 3:21:53 PM , Rating: 5
physics processing, for example, is exceedingly need the output from the last calc as input to the current calc, which then feeds the next calc, all in series - can't be performed in parallel. So there's no value in the next 4 cores.

Actually physics processing isn't like that at all... it almost always consists of doing similar calculations on a large amount of data, and they can all be done in parallel. That's why things like PhysX can be handled so much better by a modern GPU than on any x86 CPU.

RE: This is not all that surprising...
By Motoman on 1/17/09, Rating: 0
RE: This is not all that surprising...
By kkwst2 on 1/17/2009 10:27:26 PM , Rating: 3
As someone who does a lot of computer modeling, I'm certainly biased, but I'd disagree with your point. A large portion of users who really need high performance computers benefit greatly from multiple cores. This includes images processing, video processing, physics modeling, 3D rendering, biological modeling, stochastic modeling, etc.

In my case (fluid modeling) the programs I use scale very nicely well above 50 cores using clustering. In a single node, two 4-core Xeons are nearly twice as fast as a single one, so scaling is quite good all the way up to 8 cores. The number of cores in each node depends on the architecture used and the efficiency of more cores per node certainly is quite dependent on the memory architecture. So, the article has a point but is probably an oversimplification. It seems to assume that memory architectures are not going to advance and scale with increasing cores, which I'm not sure is true.

By masher2 (blog) on 1/18/2009 12:12:18 AM , Rating: 2
Back when I did MHD modeling, the massively parallel supercomputer we used supposedly had several times as much silicon devoted to node-to-node communication as it did to actual computing on each node itself. I can see Intel having to make serious architectural changes to get decent performance from a 16+ core cpu.

In the case of your two 4-core Xeons, though, you have to remember that this is slightly different than one 8-core CPU. The 2x4 option gives you twice as much cache bandwidth, which if you code fits in cache, is going to negate pretty much all the bandwidth crunch from scaling beyond 4 cores.

By Fritzr on 1/18/2009 1:27:03 AM , Rating: 2
When scaling using clusters each node has it's own dedicated memory. The article is talking about multiple cores using a single memory which is what you get with the current multicore processors.

There is one memory connection reached through the memory controller and each core has to share that connection.

Assuming all cores are busy and reading/writing main memory then for a dual core the memory is half speed per core, quad core is quarter speed per core, eignt core is 1/8 speed per core ... as the number of cores goes up, the average available memory bandwidth per core drops.

One work around is larger unshared cache. The bigger the cache dedicated to each core the less that core is likely to need to go to main memory. As new code is written that is optimized in such a manner as to minimize main memory access the performance of multicore will go up.

For now when comparing multicore CPUs you need to look at per core dedicated cache. Larger cache boosts performance of multicore CPUs by reducing memory contention. This was the original solution used for supercomputers...each processor node has a large dedicated memory.

RE: This is not all that surprising...
By Motoman on 1/18/2009 11:06:43 AM , Rating: 1
...your case is wildly different from the normal desktop usage of the average consumer, which is the point I was trying to make. And I was thinking about applications like yours when I mentioned an ERP and an RDBMS. What you're doing is highly specialized, and is well-suited for multi--core/CPU usage.

For the normal, average statement applies. And frankly, so does the guy's post currently at the top of the list, who has now been rated down to zero.

It's like people are already zombified to the idea that more cores is always going to be better.

Pick your favorite game, all you gamers out there. Benchmark it on a single core, dual-core, quad-core, and then 8-core CPU (all of the same family to keep things even, at the same speed, etc.). It's virtually certain that the benchies will fall on their face at the quad-core...and even as newer games come out, there will not be things that can be spread over 8 cores, or 16, or whatever.

...unless, as I've said, some kind of currently unconcievable technomagic can be invented to allow serial processing to occur over multiple (parallel) cores. Which as far as I know, is impossible.

But look at the benchmarks. *Look at them.* And please don't start with the wonky synthetics that have a PC do 40 things at a time - that proves nothing. Average conusmers and gamers don't encode video while ripping MP3 tracks while folding proteins while compiling C# code while typing a letter to Grandma. Bench your favorite individual games and applications.

RE: This is not all that surprising...
By mathew7 on 1/18/2009 3:00:49 PM , Rating: 2
While you are right with the games, there is one point that I have not read until now: the SW will have to adapt.
The games from last 2 years all are adapted to 2-cores (at least those which need CPU performance). I could even say that they ignored quad-cores, because not many gamers had quad-cores (they were developed while quads were very expensive). Switching from 1 to 2 cores was easy for games. But now splicing the workload again will not benefit as much. So doing this on currently released games would have been a waste of time/resources. Probably the games that are half-way in development now can benefit from 4-cores. But that has to be decided from an early stage.
One of the problem is that the current programmers are not used to think with paralel algorithms. Also, paralelism cannot be applied to everything.

Current desktop applications do not require much performance. I mean you could have a big excel file with lots of data, which I'm sure MS had it designed to benefit of as many cores as you have. But the point is that the file should be very big and very complex for you to be affected by current processors (I mean the CPU workload to be timed in minutes, not seconds). At that dimensions, you would be better with a DB application.

By William Gaatjes on 1/19/2009 1:56:22 PM , Rating: 2
True, Since windows NT Version 6 (yes vista) microsoft updated the schedular, interrupt and thread handeling mechanisms to take use of hardware features modern processors have since K7 or the P4 at the least. Windows XP (NT5) uses an anciënt schedular, interrupt and thread handeling mechanisms based on software loops togther with interrupt timers while vista does these things in hardware.

See this link :

The multimedia class service is useless tho in my opinion.
If microsoft would just use a large enough memory buffer for audio data and the audio chip DMA's the data from memory and the cpu get's to update that data before the audiochip runs into the end of the memory region it was assigned to DMA, then you will never notice a glitch.

As is readyboost useless.

Superfetch seems handy but we need more bandwidth from HDD to main memory before superfetch is really interesting.

RE: This is not all that surprising...
By Reclaimer77 on 1/17/2009 8:11:21 PM , Rating: 3
There is NOTHING surprising here. Sandia must enjoy wasting their time.

This is really no big deal. Intel and AMD have already dealt with this in the real world.

Nice job Sandia. I await your next breakthrough when you inform us of something else equally obvious and meaningless.

RE: This is not all that surprising...
By Motoman on 1/18/2009 11:09:22 AM , Rating: 2
Intel and AMD have already dealt with this in the real world.

Really? Please illucidate this topic for us.

RE: This is not all that surprising...
By Reclaimer77 on 1/18/2009 11:54:53 AM , Rating: 1
It's a non topic. You think Intel and AMD are a bunch of idiots who blindly add cores to CPU's without taking memory usage into account ?

I'm not sure what you want me to say. The article is simply stating the obvious, and it sure as hell isn't news to Intel or AMD. Why do you think we have on die memory controllers and dual and triple channel memory now ?

How do you explain that software WRITTEN for 8 threads runs faster in the i7 than quad cores ?

RE: This is not all that surprising...
By Motoman on 1/18/2009 12:08:10 PM , Rating: 2
...How do you explain that we can expect *all* applications to benefit from an 8-core processor? Or 16-core?

I think that Intel and AMD are geniuses...they ran into a wall and found a way around it. But I think people like you are either far too into specialized niches that *will* benefit from lots of cores, or too far bought into the marketing to actually think about the ramifications for the typical consumer.

Applications and games that are used by the typical consumer are simply not going to be able to spread across a whole lot of parallel cores. They just aren't. So if there is some magic that will allow purely serial processes to run across multiple parallel cores, please let me know. If there isn't, please stop apparently pretending that more cores is better for everything...because it isn't.

By retrospooty on 1/19/2009 8:49:05 AM , Rating: 2
"...How do you explain that we can expect *all* applications to benefit from an 8-core processor? Or 16-core?"

??? I dont... Because we dont. Who expects that?

What we ALL know is that only mutithreaded apps benefit from multi cores and we ALL know that most games and high end apps that need extra CPU power ARE being written for multiple threads. Apps that dont need the CPU power are generally left alone.

By Jeff7181 on 1/18/2009 1:57:12 PM , Rating: 3
Ever heard of double data rate memory? Dual memory channels? Quad memory channels? Quad pumped busses?

CPU manufacturers understood a ALONG time ago that as the processing power of CPU's increase, the demands on the external bus increase also. All those things mentioned above are designed to provide the CPU with more memory bandwidth to allow the CPU to operate to it's potential.

Do you think Intel is using three memory channels for their newest chips because they got sick of seeing either 2 or 4 memory slots on a motherboard and wanted to mix it up a little with 3 or 6? Of course not... it's because they've already identified a problem feeding their new dual and quad core processors with enough data for them to crunch so they increased the memory bandwidth by adding a third channel.

By fri2219 on 1/18/2009 10:01:54 PM , Rating: 2
No kidding, welcome to 1988.

By retrospooty on 1/19/2009 8:46:16 AM , Rating: 2
"This is really no big deal. Intel and AMD have already dealt with this in the real world."

Yup... I dont know if I blame sandia for saying it though... Kind of not worth posting on this site though. Considering Sandia is a huge govt. funded science lab and this is a consumer site...

By SmartWarthog on 1/19/2009 12:36:15 PM , Rating: 2
First, as noted here, you have to deal with too many cores trying to access the same memory

The Phenom's split DRAM controller probably accounts for much of its performance improvements over its predecessor.

RE: This is not all that surprising...
By Oregonian2 on 1/19/2009 3:14:59 PM , Rating: 2
You've had your posting's point score knocked down a few because your posting, although I think written with good honest intention also showed a rather large, uh, lack of knowledge of processor architecture and usage. I personally think your ideas should be argued against instead, but that's me. And yes, I am an "EE Guy" who professionally designed computer systems starting with the Intel 8008 back when it was hot stuff. In terms of this thread, hunt down photos of CPU die -- there usually are some when new processors come out. Note how the "CPU proper" usually takes only a minority portion of the chip! Most of the area is usually taken up by cache memory. Think about the implications of that observation in the context of this thread.

By William Gaatjes on 1/19/2009 3:34:22 PM , Rating: 2
For the interested :

and to top it off with some tests :

Why is cache so important ?
Well, triple 3 channel memory is around 16 times slower then the cache of Intels fastest offering i7 965. Imagine that the execution unit's inside the core i7 965 would be just waiting for data wihtout cache. the cpu's would be terribly slow. And the x86 complete(thus meaning including decoders load and store ) execution unit's are still big when compared to other modern architectures. But they only need to be because they need to decode the variable lenght x86 instruction set (meaning instructions can be for example 8 bits or 16 bits or 32 bits or 64 bits long ) This makes it less easy to feed the instructions as easy digestive food to the execution unit's.

"We can't expect users to use common sense. That would eliminate the need for all sorts of legislation, committees, oversight and lawyers." -- Christopher Jennings

Related Articles

Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki