backtop


Print 112 comment(s) - last by MrPoletski.. on Jan 27 at 11:45 AM

Sandia simulations reveal memory is the bottleneck for some multi-core processors

Years ago, the hallmark of processor performance was clock speed. As chipmakers hit the wall on how far they could push clock speeds processor designs started to go to multiple cores to increase performance. However, as many users can tell you performance doesn't always increase the more cores you add to a system.

Benchmarkers know that a quad core processor often offers less performance than a similarly clocked dual-core processor for some uses. The reason for this phenomenon according to Sandia is one of memory availability. Supercomputers have tried to increase performance by moving to multiple core processors, just as the world of consumer processors has done.

The Sandia team has found that simply increasing the number of cores in a processor doesn't always improve performance, and at a point the performance actually decreases. Sandia simulations have shown that moving from dual core to four core processors offers a significant increase in performance. However, the team has found that moving from four cores to eight cores offers an insignificant performance gain. When you move from eight cores to 16 cores, the performance actually drops.

Sandia team members used simulations with algorithms for deriving knowledge form large data sets for their tests. The team found that when you moved to 16 cores the performance of the system was barely as good as the performance seen with dual-cores.

The problem according to the team is the lack of memory bandwidth along with fighting between the cores over the available memory bus of each processor. The team uses a supermarket analogy to better explain the problem. If two clerks check out your purchases, the process goes faster, add four clerks and things are even quicker.

However, if you add eight clerks or 16 clerks it becomes a problem to not only get your items to each clerk, but the clerks can get in each other's way leading to slower performance than using less clerks provides. Team member Arun Rodrigues said in a statement, "To some extent, it is pointing out the obvious — many of our applications have been memory-bandwidth-limited even on a single core. However, it is not an issue to which industry has a known solution, and the problem is often ignored."

James Peery, director of Sandia's Computations, Computers, Information, and Mathematics Center said, "The difficulty is contention among modules. The cores are all asking for memory through the same pipe. It's like having one, two, four, or eight people all talking to you at the same time, saying, 'I want this information.' Then they have to wait until the answer to their request comes back. This causes delays."

The researchers say that today there are memory systems available that offer dramatically improved memory performance over what was available a year ago, but the underlying fundamental memory problem remains.

Sandia and the ORNL are working together on a project that is intended to pave the way for exaflop supercomputing. The ORNL currently has the fastest supercomputer in the world, called the Jaguar, which was the first supercomputer to break the sustained petaflop barrier.



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

RE: Depends on the application
By jrb531 on 1/17/2009 2:55:49 PM , Rating: 0
Yes I oversimplified it but the point still stands. It's "much" easier to keep 2 cores busy than it is for 3 or 4 cores.

If you have the money to get a 3000mhz 4x core CPU vs a 2x 3000mhz (and do not want to OC) then by all means the 4x core is better.

If you are on a limited budget then it is "almost" always better to get a faster 2x core over a slower 4x core.

The reasoning is simple... you will "always" use the extra MHZ speed of a 1x or 2x core but not always use the 3rd or 4th cores. This means, of course, that your money is better spent buying the fastest 2x core you can vs a slower 3x or 4x core.

I think we all would love to have one of those 4x core Intel $1300 speed demons but few of us can afford (or justify) such a CPU.

A good example is AMD's new 7750 2x core CPU that OC's to at least 3100mhz at stock voltage for about $75

Should I spend $175 to get a AMD 4x core Phenom that can OC to 2900mhz at stock or save $100 and get that 2x core that runs slightly faster?

In benchmarks the 4x core kills the lowly 7750 but in "most" games the FPS is the same ir even faster on the 7750 because the game only uses the one core and it's clocked faster.

It all depends on the applications/games you want to run and how much money you have to burn.

IMHO I would rather take the saved $100 and put it toward a better Video Card that will affect a game far far more than the CPU would.

If you can "afford" the very best CPU "and" Video Card then by all means do so but if you have to make a choice.... get a 2x core CPU and a faster Video Card.... for games that is :)


RE: Depends on the application
By SlyNine on 1/17/2009 4:00:49 PM , Rating: 2
Yes, I agree but the 3rd and 4th core can account for a MUCH BIGGER performance increase if utilized fully. Where as your slightly 20-40% clocked Dual core will only ever be 20-40% faster and the quad core can be 200% faster.

It's all about trade of and usage model


RE: Depends on the application
By mindless1 on 1/17/2009 11:46:29 PM , Rating: 2
Sure it "could" but usually won't come remotely close to 200% benefit not only due to suboptimal software design, but the other system bottlenecks like bus, memory, hard drive, video card, etc.

Then there's the question of whether most people, or even anyone, is really buying all this new software at hundreds to thousands of dollars which reviewer-benchmarkers seem to assume is the correct software to use when comparing processors, but then they fail to take this addt'l cost into consideration when factoring for relative value in their conclusions to reviews.


RE: Depends on the application
By SlyNine on 1/18/2009 1:49:28 AM , Rating: 2
Same can be said about higher clock speeds though.

I think when comparing CPU's its more a matter of weather or not they use the same software, then using the most likely software. Because you will never be able to compare the millions if not billions of possible software/hardware combinations. The only thing you can do is give a apples to apples comparison with benchmarking software that offers a common usage model.

Like Futuremark, it may not be perfect but it does correlate to real world a lot, you will never see a 8800GT beat a 4870 in real life or futuremark, between a 280GTX and the 4870 they are close and it shows.

The only way to tell exactly how something is going to work in your unique case is try it first hand or I guess get very lucky and find a reviewer that uses the exact same hardware and hope its set up the same as yours.


RE: Depends on the application
By SlyNine on 1/17/2009 4:02:33 PM , Rating: 2
also Frames per second is not always the most important thing, In RTS things like simulation speed can be much more important.

Who cares if I get 60FPS if it takes 10min to play threw 1 min of game time.


"Mac OS X is like living in a farmhouse in the country with no locks, and Windows is living in a house with bars on the windows in the bad part of town." -- Charlie Miller

Related Articles













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki