backtop


Print 112 comment(s) - last by MrPoletski.. on Jan 27 at 11:45 AM

Sandia simulations reveal memory is the bottleneck for some multi-core processors

Years ago, the hallmark of processor performance was clock speed. As chipmakers hit the wall on how far they could push clock speeds processor designs started to go to multiple cores to increase performance. However, as many users can tell you performance doesn't always increase the more cores you add to a system.

Benchmarkers know that a quad core processor often offers less performance than a similarly clocked dual-core processor for some uses. The reason for this phenomenon according to Sandia is one of memory availability. Supercomputers have tried to increase performance by moving to multiple core processors, just as the world of consumer processors has done.

The Sandia team has found that simply increasing the number of cores in a processor doesn't always improve performance, and at a point the performance actually decreases. Sandia simulations have shown that moving from dual core to four core processors offers a significant increase in performance. However, the team has found that moving from four cores to eight cores offers an insignificant performance gain. When you move from eight cores to 16 cores, the performance actually drops.

Sandia team members used simulations with algorithms for deriving knowledge form large data sets for their tests. The team found that when you moved to 16 cores the performance of the system was barely as good as the performance seen with dual-cores.

The problem according to the team is the lack of memory bandwidth along with fighting between the cores over the available memory bus of each processor. The team uses a supermarket analogy to better explain the problem. If two clerks check out your purchases, the process goes faster, add four clerks and things are even quicker.

However, if you add eight clerks or 16 clerks it becomes a problem to not only get your items to each clerk, but the clerks can get in each other's way leading to slower performance than using less clerks provides. Team member Arun Rodrigues said in a statement, "To some extent, it is pointing out the obvious — many of our applications have been memory-bandwidth-limited even on a single core. However, it is not an issue to which industry has a known solution, and the problem is often ignored."

James Peery, director of Sandia's Computations, Computers, Information, and Mathematics Center said, "The difficulty is contention among modules. The cores are all asking for memory through the same pipe. It's like having one, two, four, or eight people all talking to you at the same time, saying, 'I want this information.' Then they have to wait until the answer to their request comes back. This causes delays."

The researchers say that today there are memory systems available that offer dramatically improved memory performance over what was available a year ago, but the underlying fundamental memory problem remains.

Sandia and the ORNL are working together on a project that is intended to pave the way for exaflop supercomputing. The ORNL currently has the fastest supercomputer in the world, called the Jaguar, which was the first supercomputer to break the sustained petaflop barrier.



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Depends on the application
By jrb531 on 1/17/2009 1:10:16 PM , Rating: 0
If the application has not been written from the ground up (from day one... not some tacked on coding hacks so they can claim multi-core support) to support more than one core then the "sweet spot" will always be 2 cores because the OS and background apps can run on one core and the single core app can run on the other.

Now if you run a 3x or 4x CPU at the same "MHZ" as the 2x core CPU, the 3x or 4x core CPU will be the better pick but often people trade "MHZ" for cores and this is not always the right thing to do.

An example of this are many games that are still designed mainly for a single core. In this case I have seen people "upgrade" from a 3000mhz 2x core to a 2500mhz 4x core and actually see less performance.

They spend big bucks so they can proclaim they have a 4x core and see their games run slower.

The more cores you add, the harder it is to make sure that all the cores will run at a certain speed. This is why it's always harder to OC a 3x or 4x core CPU vs a 1x or 2x core.

Now there are apps that can use more than one core but we are at a transistion stage right now in which most of us still run more single cores apps vs multicore apps.

So unless you have a ton of money and can afford the faster 3x or 4x cores, the "smart" people prefer the fastest 2x cores.

I'm sure the "benchmark" people will attack me on this but facts are facts... dollar for dollar you get more for your money by buying the fastest 2x core you can afford. Seldom does core 3 or 4 even get used and even with apps that can use mulitple cores... most do not use "much" of that 3rd or 4th core.

I'm sure people can find some apps that do but I'm talking about most.

Most people are better off with a 3000mhz 2x core CPU vs a 4x core 2500mhz cpu.... nuff said.




RE: Depends on the application
By Shida on 1/17/09, Rating: 0
RE: Depends on the application
By Targon on 1/17/2009 4:58:53 PM , Rating: 5
It depends on what you are looking at buying since it should be a factor. You can go with an AMD Phenom 2 940 for example, running at 3GHz per core, and no matter what, on the AMD side of the industry, you will be seeing the best performance from the new processor.

Comparing a dual core running at 3GHz vs. a quad running at 2.5GHz, then yea, the dual core would be better for legacy games and applications, but running at the same clock speed at this point, there isn't any real downside.

When it comes to looking forward, we will see both AMD and Intel moving to 3, 4, or higher for the number of memory channels the CPU can access and that will take care of the problem with accessing memory. We may see new types of memory that also address this by making it so the memory module will allow for 4 or more banks on a single module, giving much greater bandwidth per module.

I love how people use simulations to come up with this sort of conclusions, as if AMD and Intel don't look for ways to resolve this sort of problem in the first place.


RE: Depends on the application
By Motoman on 1/17/09, Rating: 0
RE: Depends on the application
By mindless1 on 1/17/2009 2:32:58 PM , Rating: 4
You are wrong about the sweet spot, it is fairly irrelevant that one core can handle the OS and other background apps, because if someone cares about performance much they will have what most do - the OS and background apps taking a mere 1% of processor time, not even close to as significant as memory, bus, and CPU speed for a single OR dual core CPU.

On the other hand, many games or other apps are multithreaded, even in ways we don't so directly attribute like positional sound processing on some sound cards. To that extent, you were correct it could be considered an OS process instead of directly attributed to the game, but in general the idea that background things usually accounting for a trivial amount of processing time would be an argument for multiple cores is incorrect.


RE: Depends on the application
By jrb531 on 1/17/09, Rating: 0
RE: Depends on the application
By SlyNine on 1/17/2009 4:00:49 PM , Rating: 2
Yes, I agree but the 3rd and 4th core can account for a MUCH BIGGER performance increase if utilized fully. Where as your slightly 20-40% clocked Dual core will only ever be 20-40% faster and the quad core can be 200% faster.

It's all about trade of and usage model


RE: Depends on the application
By mindless1 on 1/17/2009 11:46:29 PM , Rating: 2
Sure it "could" but usually won't come remotely close to 200% benefit not only due to suboptimal software design, but the other system bottlenecks like bus, memory, hard drive, video card, etc.

Then there's the question of whether most people, or even anyone, is really buying all this new software at hundreds to thousands of dollars which reviewer-benchmarkers seem to assume is the correct software to use when comparing processors, but then they fail to take this addt'l cost into consideration when factoring for relative value in their conclusions to reviews.


RE: Depends on the application
By SlyNine on 1/18/2009 1:49:28 AM , Rating: 2
Same can be said about higher clock speeds though.

I think when comparing CPU's its more a matter of weather or not they use the same software, then using the most likely software. Because you will never be able to compare the millions if not billions of possible software/hardware combinations. The only thing you can do is give a apples to apples comparison with benchmarking software that offers a common usage model.

Like Futuremark, it may not be perfect but it does correlate to real world a lot, you will never see a 8800GT beat a 4870 in real life or futuremark, between a 280GTX and the 4870 they are close and it shows.

The only way to tell exactly how something is going to work in your unique case is try it first hand or I guess get very lucky and find a reviewer that uses the exact same hardware and hope its set up the same as yours.


RE: Depends on the application
By SlyNine on 1/17/2009 4:02:33 PM , Rating: 2
also Frames per second is not always the most important thing, In RTS things like simulation speed can be much more important.

Who cares if I get 60FPS if it takes 10min to play threw 1 min of game time.


RE: Depends on the application
By SlyNine on 1/17/2009 3:56:48 PM , Rating: 3
The Quad core 2500 mhz should be plenty for any game out there, However the Dual Core @ 3ghz may not be enough in 3 years when a game needs it.

Depending on your upgrade cycle going with slightly less performance today to get better tomorrow could be a huge plus. RTS games I imagine will take a huge leap in supporting Quad core CPU's sooner or later.

But if you upgrade every year then you are probably better of getting a dual core and clocking it to 4 ghz. otho a Core I7 at 3ghz will probably out perform it.


RE: Depends on the application
By DanoruX on 1/17/2009 7:23:09 PM , Rating: 1
Been running my Q6600 @ 3.6Ghz for the past year and I wouldn't trade it for a dual-core any day, even if I clocked a Penryn to 4.5Ghz for one very simple reason - quad core lets me run more stuff at the same time! That is - having a 8 person skype conference open, bittorrent, encoding video and playing TF2 at the same time is no problem whatsoever.

That said, I'm glad most of my stuff isn't memory limited.


RE: Depends on the application
By Totally on 1/17/2009 9:02:55 PM , Rating: 3
Be realistic, game and run torrents meanwhile talking to 7 other people?

Bittorrent and TF2 how can you play with the lag and talk with. For me it's horrible when i try to play with the ping throught the roof, Left 4 Dead just screams at me.


RE: Depends on the application
By SlyNine on 1/18/2009 2:10:34 AM , Rating: 2
You'd be surprised what you do when you have the power to do it.

I haven't been able to get my Q6600 over 3ghz(owell) but When I game I leave tons of apps that im working on, set my encoding to run on core 1 and 2, and game away. In the future where games require the extra cores I will either have to buy something new, or stop doing things in the back ground. But I will also enjoy the boost from quad core at that time.

I've been on dual core and quad core. Quad really is that much better for me and if you don't want to upgrade your computer for a few years I recommend going quad and sacrifice some ghz if you are going to upgrade in one year and only play games, then go dual core.


"I'm an Internet expert too. It's all right to wire the industrial zone only, but there are many problems if other regions of the North are wired." -- North Korean Supreme Commander Kim Jong-il

Related Articles













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki