Print 112 comment(s) - last by MrPoletski.. on Jan 27 at 11:45 AM

Sandia simulations reveal memory is the bottleneck for some multi-core processors

Years ago, the hallmark of processor performance was clock speed. As chipmakers hit the wall on how far they could push clock speeds processor designs started to go to multiple cores to increase performance. However, as many users can tell you performance doesn't always increase the more cores you add to a system.

Benchmarkers know that a quad core processor often offers less performance than a similarly clocked dual-core processor for some uses. The reason for this phenomenon according to Sandia is one of memory availability. Supercomputers have tried to increase performance by moving to multiple core processors, just as the world of consumer processors has done.

The Sandia team has found that simply increasing the number of cores in a processor doesn't always improve performance, and at a point the performance actually decreases. Sandia simulations have shown that moving from dual core to four core processors offers a significant increase in performance. However, the team has found that moving from four cores to eight cores offers an insignificant performance gain. When you move from eight cores to 16 cores, the performance actually drops.

Sandia team members used simulations with algorithms for deriving knowledge form large data sets for their tests. The team found that when you moved to 16 cores the performance of the system was barely as good as the performance seen with dual-cores.

The problem according to the team is the lack of memory bandwidth along with fighting between the cores over the available memory bus of each processor. The team uses a supermarket analogy to better explain the problem. If two clerks check out your purchases, the process goes faster, add four clerks and things are even quicker.

However, if you add eight clerks or 16 clerks it becomes a problem to not only get your items to each clerk, but the clerks can get in each other's way leading to slower performance than using less clerks provides. Team member Arun Rodrigues said in a statement, "To some extent, it is pointing out the obvious — many of our applications have been memory-bandwidth-limited even on a single core. However, it is not an issue to which industry has a known solution, and the problem is often ignored."

James Peery, director of Sandia's Computations, Computers, Information, and Mathematics Center said, "The difficulty is contention among modules. The cores are all asking for memory through the same pipe. It's like having one, two, four, or eight people all talking to you at the same time, saying, 'I want this information.' Then they have to wait until the answer to their request comes back. This causes delays."

The researchers say that today there are memory systems available that offer dramatically improved memory performance over what was available a year ago, but the underlying fundamental memory problem remains.

Sandia and the ORNL are working together on a project that is intended to pave the way for exaflop supercomputing. The ORNL currently has the fastest supercomputer in the world, called the Jaguar, which was the first supercomputer to break the sustained petaflop barrier.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: This is not all that surprising...
By Motoman on 1/18/2009 11:06:43 AM , Rating: 1
...your case is wildly different from the normal desktop usage of the average consumer, which is the point I was trying to make. And I was thinking about applications like yours when I mentioned an ERP and an RDBMS. What you're doing is highly specialized, and is well-suited for multi--core/CPU usage.

For the normal, average statement applies. And frankly, so does the guy's post currently at the top of the list, who has now been rated down to zero.

It's like people are already zombified to the idea that more cores is always going to be better.

Pick your favorite game, all you gamers out there. Benchmark it on a single core, dual-core, quad-core, and then 8-core CPU (all of the same family to keep things even, at the same speed, etc.). It's virtually certain that the benchies will fall on their face at the quad-core...and even as newer games come out, there will not be things that can be spread over 8 cores, or 16, or whatever.

...unless, as I've said, some kind of currently unconcievable technomagic can be invented to allow serial processing to occur over multiple (parallel) cores. Which as far as I know, is impossible.

But look at the benchmarks. *Look at them.* And please don't start with the wonky synthetics that have a PC do 40 things at a time - that proves nothing. Average conusmers and gamers don't encode video while ripping MP3 tracks while folding proteins while compiling C# code while typing a letter to Grandma. Bench your favorite individual games and applications.

RE: This is not all that surprising...
By mathew7 on 1/18/2009 3:00:49 PM , Rating: 2
While you are right with the games, there is one point that I have not read until now: the SW will have to adapt.
The games from last 2 years all are adapted to 2-cores (at least those which need CPU performance). I could even say that they ignored quad-cores, because not many gamers had quad-cores (they were developed while quads were very expensive). Switching from 1 to 2 cores was easy for games. But now splicing the workload again will not benefit as much. So doing this on currently released games would have been a waste of time/resources. Probably the games that are half-way in development now can benefit from 4-cores. But that has to be decided from an early stage.
One of the problem is that the current programmers are not used to think with paralel algorithms. Also, paralelism cannot be applied to everything.

Current desktop applications do not require much performance. I mean you could have a big excel file with lots of data, which I'm sure MS had it designed to benefit of as many cores as you have. But the point is that the file should be very big and very complex for you to be affected by current processors (I mean the CPU workload to be timed in minutes, not seconds). At that dimensions, you would be better with a DB application.

By William Gaatjes on 1/19/2009 1:56:22 PM , Rating: 2
True, Since windows NT Version 6 (yes vista) microsoft updated the schedular, interrupt and thread handeling mechanisms to take use of hardware features modern processors have since K7 or the P4 at the least. Windows XP (NT5) uses an anciënt schedular, interrupt and thread handeling mechanisms based on software loops togther with interrupt timers while vista does these things in hardware.

See this link :

The multimedia class service is useless tho in my opinion.
If microsoft would just use a large enough memory buffer for audio data and the audio chip DMA's the data from memory and the cpu get's to update that data before the audiochip runs into the end of the memory region it was assigned to DMA, then you will never notice a glitch.

As is readyboost useless.

Superfetch seems handy but we need more bandwidth from HDD to main memory before superfetch is really interesting.

"Mac OS X is like living in a farmhouse in the country with no locks, and Windows is living in a house with bars on the windows in the bad part of town." -- Charlie Miller
Related Articles

Most Popular ArticlesAre you ready for this ? HyperDrive Aircraft
September 24, 2016, 9:29 AM
Leaked – Samsung S8 is a Dream and a Dream 2
September 25, 2016, 8:00 AM
Yahoo Hacked - Change Your Passwords and Security Info ASAP!
September 23, 2016, 5:45 AM
A is for Apples
September 23, 2016, 5:32 AM
Walmart may get "Robot Shopping Carts?"
September 17, 2016, 6:01 AM

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki