backtop


Print 50 comment(s) - last by surt.. on Mar 15 at 2:37 PM

Intel says parallel software is more important for many-core CPUs like "Larrabee"

Multi-core processors have been in the consumer market for several years now. However, despite having access to CPUs with two, three, four, and more cores, there are still relatively few applications available that can take advantage of multiple cores. Intel is hoping to change that and is urging developers of software to think parallel.

Intel director and chief evangelist for software development products talked about thinking parallel in a keynote speech he delivered at the SD West conference recently. James Reinders said, "One of the phrases I've used in some talks is, it's time for us as software developers to really figure out how to think parallel." He also says that the developer who doesn’t think parallel will see their career options limited.

Reinders gave the attendees eight rules for thinking parallel from a paper he published in 2007 reports ComputerWorld. The eight rules include -- Think parallel; program using abstraction; program tasks, not threads; design with the option of turning off concurrency; avoid locks when possible; use tools and libraries designed to help with concurrency; use scalable memory; and design to scale through increased workloads.

He says that after half a decade of shipping multi-core CPUs, Intel is still struggling with how to use the available cores. The chipmaker is under increasing pressure from NVIDIA who is leveraging a network of developers to program parallel applications to run on its family of GPUs. NVIDIA and Intel are embroiled in a battle to determine if the GPU or CPU will be the heart of future computer systems.

Programming for processors with 16 or 32 cores takes a different approach according to Reinders. He said, "It's very important to make sure, if at all possible, that your program can run in a single thread with concurrency off. You shouldn't design your program so it has to have parallelism. It makes it much more difficult to debug."

Reinders talked about the Intel Parallel Studio tool kit in the speech, a tool kit for developing parallel applications in C/C++, which is currently in its beta release. Reinders added, "The idea here [with] this project was to add parallelism support to [Microsoft's] Visual Studio in a big way."

Intel says that it plans to offer the parallel development kit to Linux programmers this year or early next year. The CPU Reinders is talking about when he says many-core is the Larrabee processor. Intel provided some details on Larrabee in August of 2008.

One of the key features of Larrabee is that it will be the heart of a line of discrete graphics cards, a market Intel has not participated in. Larrabee is said to contain ten of more cores inside the discrete package. If Larrabee comes to be in the form Intel talked about last year it will be competing directly against NVIDIA and ATI in the discrete graphics market.

NVIDIA is also rumored to be eyeing an entry into the x86 market as well. Larrabee will be programmable in the C/C++ languages, just as NVIDIA's GPUs are via the firms CUDA architecture.



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

RE: What am I missing here?
By sinful on 3/11/2009 8:02:52 PM , Rating: 2
quote:
Games: no way. You can use as many cores as you want for games. Most games being put out these days could be easily written to support thousands of cores for better performance. Games are extremely parallel in nature, and not just in graphics, but in mechanics as well.


Not quite. Most games are *extremely* serial.

Think about it: most games are 100% dependent on what key you press at that moment.

In other words, there are practically an infinite number of possible moves you could do, but you have no idea which one until the user has decided.

It's not like you're blindly chugging away at 10GB of video and only worried about throughput, you're concerned with 10K worth of data that you have to act on as soon as possible. Whether you have 8 cores or 4 cores waiting for that data is pretty irrelevent - they can't do much (intelligently) until they know what you're going to do.

Now, you can *attempt* to decouple your input from your other tasks, but that's not an easy task - How exactly do you make the AI respond intelligently to moves you haven't performed yet? Just blindly compute every possible move you might make? That can be an incredible waste of CPU power (especially if you have users not on a multi-core system).


RE: What am I missing here?
By Scrogneugneu on 3/11/2009 10:06:38 PM , Rating: 4
Very, very wrong.

Suppose a game has a single-threaded AI engine. Whenever the time to render a screen comes up, that thread needs to figure out what each and every character in the scene will do in reaction to it's environment. So if you got 20 characters and need an average of 2 ms to determine the course of action per character, you use about 40 ms to compute that. That's serial processing.

Now suppose it has a multi-threaded AI engine (let's say 4 threads). Whenever the time to render a screen comes up, a bunch of threads can compute the reaction of every character simultaneously, rather than one-by-one. So if you got 20 characters and need an average of 2 ms to determine the course of action per character, you use about 10 ms to compute that (20 characters x 2 ms each / 4 threads). That's parallel processing.

In theory , the parallel engine could handle 80 characters while maintaining the same performance level of the serial engine.

Of course, the multi-threaded solution has some overhead, the numbers are simplified. And I'm not even talking about the mess you can get into if a bug sneaks his way into your data synchronization mechanism. But you should get the idea why a multi-threaded game engine has more potential than a single-threaded one.


RE: What am I missing here?
By sinful on 3/14/2009 3:58:18 AM , Rating: 2
quote:
Now suppose it has a multi-threaded AI engine (let's say 4 threads). Whenever the time to render a screen comes up, a bunch of threads can compute the reaction of every character simultaneously, rather than one-by-one. So if you got 20 characters and need an average of 2 ms to determine the course of action per character, you use about 10 ms to compute that (20 characters x 2 ms each / 4 threads). That's parallel processing.

In theory , the parallel engine could handle 80 characters while maintaining the same performance level of the serial engine.

Of course, the multi-threaded solution has some overhead, the numbers are simplified. And I'm not even talking about the mess you can get into if a bug sneaks his way into your data synchronization mechanism. But you should get the idea why a multi-threaded game engine has more potential than a single-threaded one.


But you're comparing practical *real-world* vs. *theoretical*, and no surprise, the theoretical sounds better.

In *theory*, space travel is simple. In reality, it's really complicated to actually do.
Lots and lots of cores offers amazing speedups in *theory*, but in reality it's extremely complicated to actually do.

For example, a counterpoint to your example:
Let's say the single thread can utilize the same data for each character. Instead of each additional character taking another 2ms, you might have the 1st one cost 2ms and the subsequent ones cost .5ms. Computing it serially, your 20 characters cost about 12ms.

In contrast, the multiple threads might not be completely independent - what happens when all 20 characters are in the same area? What other factors in the environment affect them? You also assume the total cost of the AI is 100% CPU bound, and not determinant upon something else - like memory speed, etc. You also assume that all 20 AI threads would execute at the same time - requiring 20 cores. In reality, you might only have 8 cores, so not all characters can be computed simultaneously. In fact, if you've got 8 cores, you're stuck waiting for at least one core to compute 3 character AI's sequentially (i.e. 20 characters/8 cores = 2.5 characters per core = at least one core computing 3 characters). Thus, 3 characters x 2ms = 6ms - without your extra thread overhead.

After all is said and done, we'll say your multi-core approach now takes 8ms. Yes, 8ms is better than 12ms.... but how much is that going to help in the real world? And how much more complicated have you made things?

Granted, multiple cores offers some improvement - but it quickly reaches limits as to how far it scales, and to how much it can improve things.
As such, it only really shines when you have huge amounts of data to process - and the data is independent.

In your example, the benefits of saving 8ms probably wouldn't be worth it (Even though it sounds great on paper!).

Now, if you're talking about 20,000 characters, then yes, it would help.

There's a HUGE difference between theory and reality.


RE: What am I missing here?
By Scrogneugneu on 3/15/2009 11:52:52 AM , Rating: 2
Do you know anything about programming?

Multiple threads can read from the same data all the same than a single thread. Concurrency problems only happen when 2 threads want to write to the same memory emplacement, reading can be infinite. The state of everything in the game can be read, but no change will happen to it until the next frame render, so each and every thread can read the same data at the same time.

This isn't mentioning that I talked about a 4 threads engine, and you picked up and went with a 20 threads engine. If you want to compute 20 characters' actions, splitting it in 4 threads requires each thread to compute 5 characters sequentially. One thread per character is very, very wasteful.

Plus, the advantage I pointed out was that you could manage more AI resources in the same time. You can go from there and add a lot of complexity to the handling of the AI, thus ending up with a much more intelligent character. Suppose we do, and the computation time goes up to 5ms per character. By taking your own numbers (supposing the data reuse you speak of saves us 1.5ms per character), we end up with 5 + (19x3.5) = 66.5ms sequentially. Using your 20 threads example, that would be 3 characters x 5ms = 15ms.

Threading isn't meant to gain tremendous speed on everything. It's meant to handle large workloads better. Nobody will implement threading on simple tasks, but the capacity to lower the additional cost per character on AI computation is huge. The same logic goes for physics.


RE: What am I missing here?
By surt on 3/12/2009 11:09:59 AM , Rating: 4
Sorry, that's not at all how well designed games actually work. They maintain a (relatively) simple game state, and update that in response to user / AI input. The game state can typically be modified in parallel, so they don't even have to force the AI to act serially, they can run AI in threads. The AI 'responds' to the change in game state as soon as you have moved and that data is committed to the shared state. Decoupled input is de rigeur in the gaming industry.

Then, also on a completely different set of threads, you have rendering, which is the process of converting the current game state to the video display. Rendering currently takes 90% of the CPU and pretty much 100% of the GPU in most titles, and rendering can easily be scaled to thousands of cores.

I worked on Diablo II. We had 9 threads there a decade ago, and could have used far more if our target audience had more cores. Things have only grown to favor multicore more in the transition to heavier and heavier 3D.


RE: What am I missing here?
By sinful on 3/14/2009 4:24:38 AM , Rating: 2
quote:
I worked on Diablo II.


Sure you did, and I taught John Carmack everything he knows about multithreading & games. LOL

quote:
We had 9 threads there a decade ago, and could have used far more if our target audience had more cores.


That said, there's a world of difference between multiple threads, and actually taking advantage of multiple cores.

Tell me, oh wise DiabloII inventor, how well does DiabloII benefit from multiple cores? If I run it on a 8 core box, am I going to see it using all those cores?
Or is it just going to peg one core at 100% while all the others idle?? Hrmmmmmm.....?


RE: What am I missing here?
By surt on 3/15/2009 2:37:55 PM , Rating: 2
Depends on your cores. Chances are you'll peg 3 cores. If you are hosting a game (and playing on the same box) you might be able to peg 8 cores. We designed the game host side code to scale to many many cpus because that's how the servers are set up (we host hundreds of games on a 64-core box).


"I want people to see my movies in the best formats possible. For [Paramount] to deny people who have Blu-ray sucks!" -- Movie Director Michael Bay

Related Articles
Intel Talks Details on Larrabee
August 4, 2008, 12:46 PM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki