backtop


Print 91 comment(s) - last by Procurion.. on May 7 at 10:09 AM

AMD engineers reveal details about the company's upcoming 45nm processor roadmap, including plans for 12-core processors

"Shanghai! Shanghai!" the reporters cry during the AMD's financial analyst day today. Despite the fact that the company will lay off nearly 5% of its work force this week, followed by another 5% next month, most employees interviewed by DailyTech continue to convey an optimistic outlook.

The next major milestone for the CPU engineers comes late this year, with the debut of 45nm Shanghai. Shanghai, for all intents and purposes, is nearly identical to the B3 stepping of Socket 1207 Opteron (Barcelona) shipping today.  However, where as Barcelona had its HyperTransport 3.0 clock generator fused off, Shanghai will once again attempt to get HT3.0 right.

Original roadmaps anticipated that HT3.0 would be used for socket-to-socket communication, but also for communication to the Southbridge controllers. Motherboard manufacturers have confirmed that this is no longer the case, and that HT3.0 will only be used for inter-CPU communication.

"Don't be disappointed, AMD is making up for it," hints one engineer.  Further conversations revealed that inter-CPU communication is going to be a big deal with the 45nm refresh.  The first breadcrumb comes with a new "native six-core" Shanghai derivative, currently codenamed Istanbul.  This processor is clearly targeted at Intel's recently announced six-core, 45nm Dunnington processor.

But sextuple-core processors have been done, or at least we'll see the first ones this year.  The real neat stuff comes a few months after, where AMD will finally ditch the "native-core" rhetoric.  Two separate reports sent to DailyTech from AMD partners indicate that Shanghai and its derivatives will also get twin-die per package treatment.  

AMD planned twin-die configurations as far back as the K8 architecture, though abandoned those efforts.  The company never explained why those processors were nixed, but just weeks later "native quad-core" became a major marketing campaign for AMD in anticipation of Barcelona.

A twin-die Istanbul processor could enable 12 cores in a single package. Each of these cores will communicate to each other via the now-enabled HT3.0 interconnect on the processor.  

The rabbit hole gets deeper.  Since each of these processors will contain a dual-channel memory controller, a single-core can emulate quad-channel memory functions by accessing the other dual-channel memory controller on the same socket.  This move is likely a preemptive strike against Intel's Nehalem tri-channel memory controller.
 
Motherboard manufacturers claim Shanghai and its many-core derivatives will be backwards compatible with existing Socket 1207 motherboards.  However, processor-to-processor communication will downgrade to lower HyperTransport frequencies on these older motherboards. The newest 1207+ motherboards will officially support the HyperTransport 3.0 frequencies.

Shanghai is currently taped out and running Windows at AMD.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Not the correct multiple prefix
By Zurtex on 4/17/2008 7:20:33 PM , Rating: 5
There is a bone of contention on what the multiple of 12 prefix, either way you've got it wrong. Depending on whether you support the latin or the greek interpretation it is either:

hendeca-
dodeca-

The more common one is dodeca, so it would be a dodeca-core

duodeci would mean 2 x 0.1




RE: Not the correct multiple prefix
By KristopherKubicki (blog) on 4/17/2008 7:23:25 PM , Rating: 1
I think you're right -- it's been fixed.


RE: Not the correct multiple prefix
By Zurtex on 4/17/2008 7:28:46 PM , Rating: 3
You can simply check it by looking up what a dodecahedron is :). The most common example of the use of the dodeca- prefix.


RE: Not the correct multiple prefix
By eetnoyer on 4/18/2008 10:05:38 AM , Rating: 4
I prefer a nice rhombicosidodecahedron.


RE: Not the correct multiple prefix
By jlips6 on 4/18/2008 5:14:59 PM , Rating: 2
I live in a rhombictricondahedron.

http://decadome.net/


RE: Not the correct multiple prefix
By wordsworm on 4/18/2008 10:16:53 AM , Rating: 2
Duo and quad are both Latin derivatives. Why switch to a Greek derivative? Duodec-core would be more inline with the current fad. It's my opinion that it rolls off the tongue a bit better as well.


By Plasmoid on 4/18/2008 11:40:23 AM , Rating: 2
We don't use Duo And Quad... AMD don't make Duo Core Processors.

And Quad came from Quakes Quad Damage.


RE: Not the correct multiple prefix
By Zurtex on 4/18/2008 12:50:58 PM , Rating: 2
dodeca- is the Latin derivative prefix for 12

hendeca- is the Greek derivative prefix for 12

duodec- isn't a prefix at all

duodeci-, Wikipedia tells me is 1/12, although I always thought it was 1/5, will consult my geometry books later.

So:

dual-core
tri-core
quad-core
etc...
dodeca-core

Is entirely consistent.


RE: Not the correct multiple prefix
By wordsworm on 4/18/2008 1:05:21 PM , Rating: 2
http://www.wordinfo.info/words/index/info/view_uni...

Also, you can find it at http://en.wikipedia.org/wiki/Dodeca

As far as duodecima is concerned, I found this link: http://bartelby.org/81/5503.html

If you follow the Wiki article, it states that the Greeks would have gone the route of di (2), tetra (4)... dodeca, whereas the Latin is listed as duo (2), quad (4), ... duodec. Are we not looking at the same wiki article? I'm not sure how it is were getting completely different information from the same source.

In any case, going Latin and then changing into Greek doesn't make much sense. Are there any Latin/Greek professors out there who can trump wordinfo.com + wikipedia.org?


RE: Not the correct multiple prefix
By Zurtex on 4/18/2008 1:09:28 PM , Rating: 2
Oh wow, I look stupid now, getting confused with the 11 prefix :=/

At least dodeca is the most common ussage.


By wordsworm on 4/18/2008 1:18:41 PM , Rating: 2
I think someone's got to tell Kris though, to save him the embarrassment of going Greek. I couldn't find any instances of either prefix being used in conjunction with a CPU. My fear is that the whole Internet might follow Kris's lead, and then it will be discovered that he got his Greek and Latin crossed... followed by youtube embarrassing films mocking our gentle blogger. Us blog followers must try to help defend the honor of DT's reputation... and stick to the Latin... before it's too late!


RE: Not the correct multiple prefix
By Zurtex on 4/18/2008 1:14:48 PM , Rating: 2
Yeah, sorry, my main mistake is getting mixed up with 11 sided objects, that's what made me think dodeca- was latin and not greek.

If you look at the columns carefully, you'll notice the most common usage of each multiple-prefix switches between Greek and Latin. The most common one for 12 is dodeca-

duodeci- seems to always be used in the context of "divided by 12" so Wiki seems to be right in it meaning 1/12.


By wordsworm on 4/18/2008 1:23:52 PM , Rating: 2
quote:
duodeci- seems to always be used in the context of "divided by 12" so Wiki seems to be right in it meaning 1/12.


You're right that it seems to be 1/12. However, duo would be 1/2, quad 1/4 (in Latin of course). So, going duodec (not duodeci) would be consistent with the naming convention that Intel seems to be establishing. Well, I think it will sound cool again once it hits centicore and millicore. I think when it hits centicore in 5 years or so I'll be ready to upgrade from this quad I've got now.


RE: Not the correct multiple prefix
By jgp on 4/20/2008 4:45:29 PM , Rating: 2
Mixing Latin and Greek is a tech tradition.

"Hexadecimal" is mixed Latin-Greek, and it's used everywhere.


By wordsworm on 4/22/2008 11:22:22 PM , Rating: 2
Good point. They should stick to sexidecimal to keep things pure.


RE: Not the correct multiple prefix
By fsardis on 4/18/2008 5:02:37 PM , Rating: 2
both hendeca and dodeka are greek dude. its for 11 and 12.


By makedonas on 4/25/2008 5:50:21 PM , Rating: 2
As Greek I usally count...

deka = 10
endeka=11
dodeka =12
Dekatria=13
dekatessera=14
and.....

there is no C in Grek language just S i sound

I think people are not aware that Latin and Greek are two diffferent languages


RE: Not the correct multiple prefix
By Garreye on 4/17/2008 10:57:09 PM , Rating: 4
I hope AMD/Intel are planning on using numbers for labeling CPU and for marketing in the future because if the core race continues for a few years, its might be confusing for consumers to sort out all the different names. Questions like "So the Core 7 myriaduohecto CPU has how many cores?" may be realistic down the road...Of course this might give people a reason to learn some greek or lating prefixes beyond 8, myself included!


RE: Not the correct multiple prefix
By semo on 4/18/2008 7:43:38 AM , Rating: 2
why do we even need latin or greek prefixes? can't we just say twelve-core.

is there a real reason for these prefixes? altough i do use words like octet on a regular basis


RE: Not the correct multiple prefix
By Zurtex on 4/18/2008 12:53:44 PM , Rating: 3
Marketing will take over,

dual-core
tri-core
quad-core

Sound better than 2-core, 3-core, 4-core. But a:

icosa-core processor

Doesn't sound as good as a 20-core processor, unless your a fan of Sci-Fi, then it sounds awesome XD.


Don't get me wrong
By GhandiInstinct on 4/17/2008 7:10:27 PM , Rating: 5
I'm all FOR multi-core.

But what about multi-thread software?

We're increasing cores but software seems way behind. But we're definitely getting more and more apps like this.

When do you guys see software catching up and us consumers start seeing the magic that multi-core software can provide?

Oh and can we get some 64-bit multi-core apps please!




RE: Don't get me wrong
By TomCorelis on 4/17/2008 7:21:37 PM , Rating: 3
Most of the prosumer-and-above software I've worked with uses multiple cores now.

A lot of games finally seem to be jumping on the bandwagon -- Sins of a Solar empire is an exception that desperately needs it though.

Though seriously I don't knwo what people expect out of this multi-core revolution. Most folks' computers are slow cause they're loaded with crap, and don't have enough RAM.


RE: Don't get me wrong
By inighthawki on 4/17/2008 7:32:19 PM , Rating: 1
agreed, most people don't realize that their 100 startup application, no matter how small, are seriously affecting the speed of their computer. Most of these people have no idea how fast a computer can be as well. I have almost all unecessary processes turned off, running vista, and it's running smooth as butter. I can't say the same for my friend who had an almost identical setup with XP running stuff in the background.

I find that it's all the little stuff like printer managers, mouse/keyboard monitors (ie. logitech), and quick launch utilities that really slow the computer down. (although multi core processors could potentially speed up some of these no doubt)


RE: Don't get me wrong
By oab on 4/17/2008 7:43:01 PM , Rating: 1
It won't really speed much of anything up, because load times are mostly limited by disk access speed, hence why SSD's have such faster boot-up times compared to HDD's. It's not that the processor is any faster, the drive can just get information into ram faster.


RE: Don't get me wrong
By inighthawki on 4/17/2008 7:49:26 PM , Rating: 2
For load times yes, but my second paragraph i was more or less referring to the speed of the computer after its fully loaded, just running apps in the background. Sorry for the confusion


RE: Don't get me wrong
By eye smite on 4/18/2008 5:51:56 PM , Rating: 3
I agree on the apps and access times for already running apps. I upgraded this older socket 939 from a 3500+ to a 4200+X2, and while the speed is the same, having that extra core to loaed into makes a noticeable difference when multi-tasking.


RE: Don't get me wrong
By JLL55 on 4/17/2008 8:17:07 PM , Rating: 2
I was just wondering, could people list possible processes to turn off or point me to one cause I think i have quite a few turned off, but I really haven't seen a "good" list.


RE: Don't get me wrong
By ViroMan on 4/18/2008 4:20:32 AM , Rating: 2
This site seems to be rather thorough about what each service does. I have used it to help me decide what to turn off and its great.

http://www.blackviper.com/


RE: Don't get me wrong
By chiadog on 4/18/2008 7:14:56 AM , Rating: 2
I prefer this site myself:
http://www.theeldergeek.com/


RE: Don't get me wrong
By ViroMan on 4/19/2008 6:27:27 AM , Rating: 2
Nice site indeed, I shall bookmark it. As far as describing services and what to do with them, the site I provided was a better choice. It tells you what they do, what you should do with them depending on what your looking for, how much ram they use, and the registry settings used.


RE: Don't get me wrong
By Alpha4 on 4/17/2008 8:37:03 PM , Rating: 3
Thats too funny. I often point out that current mainstream desktops with dual-core chips are little more than systems with liberated single-core processors.


RE: Don't get me wrong
By ImSpartacus on 4/17/2008 7:22:19 PM , Rating: 1
No kidding. 64-bit will be the next generation M$ OS in 2010 and by then we will be packing plenty of cores.


K10 cores?
By MikeMurphy on 4/17/2008 7:19:38 PM , Rating: 3
Fast interconnects are great but this is like putting spinners on a civic until the K10 core gets replaced or can clock competitively.




RE: K10 cores?
By Zurtex on 4/17/2008 7:35:48 PM , Rating: 3
It's very good for certain server application and fantastic for super computers. If they want to stay ahead of Intel's next-gen cores in the super computer market (Core2s aren't even worth considering because of their low bandwidth) they need more cores and higher bandwidth.


RE: K10 cores?
By kkwst2 on 4/17/2008 10:28:46 PM , Rating: 2
First, see my post above. That's just not true for many applications. The quad Opteron still scales quite favorably for many HPC applications.

Second, that's a horrible analogy. Putting spinners on anything is pointless. They would look just as silly on a Corvette as they would on a Civic.

The FSB limits scaling on the Quad core Xeon architecture right now. They're going to on-die memory controllers with Nehalem to solve this. The K10 Opteron is no civic and HT3.0 is not a stupid, pointless accessory.


RE: K10 cores?
By ninjit on 4/18/2008 3:36:46 AM , Rating: 1
quote:
Fast interconnects are great but this is like putting spinners on a civic until the K10 core gets replaced or can clock competitively.


Your analogy makes no sense at all. (spinners are useless no matter what you put them on)

If you MUST have a car-related jab at this, think of installing a high-end gearbox in the civic, capable of handling 8k rpm and over 500 lb-ft of torque (or "twist" if you prefer), but not replacing the engine (4k rpm and 128 lb-ft of torque).

But getting back to CPUs... my understanding is that the HT3 link is used to interconnect the 2 dies. Each die has 6 cores which will likely by hooked together via shared cache.

So you actually have 6 K10 cores filling the HT3 link, as opposed to the one - which seems like it should make much better use of the bandwidth.


HyperTransport 3.5 would be more like it...
By wingless on 4/17/2008 7:56:39 PM , Rating: 3
They should really name this HT3.5. Its a bit different from the regular implementation. I'd also like say I'm amazed how flexible HT is. I had no idea AMD could reconfigure it like this to suit their needs. Its a damn good platform to build a system on. I hope Shanghai will live up to it's promise of higher IPC.




By murphyslabrat on 4/17/2008 9:31:05 PM , Rating: 2
It isn't that it's that flexible, that is originally what it was intended to do. Just, they disabled the HT 3.0 interface for everything else.


About the memory controllers
By TeXWiller on 4/17/2008 8:47:10 PM , Rating: 2
quote:
The rabbit hole gets deeper. Since each of these processors will contain a dual-channel memory controller, a single-core can emulate quad-channel memory functions by accessing the other dual-channel memory controller on the same socket. This move is likely a preemptive strike against Intel's Nehalem tri-channel memory controller.
See AMD BIOS and Kernel Developer’s Guide, revision 3.06, page 13, the overview section. AMD is promising 3 DDR3 interfaces (channels) which would be quite a nice match for the coming six-core Shanghai derivative. Nehalem and Shanghai would have equal memory bandwith.




By KristopherKubicki (blog) on 4/17/2008 9:29:12 PM , Rating: 2
Yeah, looks like conflicting info for now. I'll try to confirm.


By suryad on 4/18/2008 10:15:41 AM , Rating: 3
because I mean sure you have many cores. Great that means you can multithread like no tomorrow. But multithread is nothing but many single threads...which means single thread performance is still extremely crucial. I dont care how many cores you have but if you want a faster machine you will need higher IPC and higher clock speeds to process each thread faster! Cores are great but clocks are still important. So technically the race for the clockspeed from a marketing point of view is over but not really if you just think about it.




Thread scheduler
By phil126 on 4/17/2008 10:32:39 PM , Rating: 2
With this many cores available Linux, Windows, and Apple need to write a half decent thread scheduler. Currently the best I have seen was by SGI. With all these cores if the OS can't properly distribute them it will all be for not. Single programs do well when they control where threads are processed but when it is left up to the OS they all do a horrible job. (I have not used Vista, but I doubt it will do well above two)




BOINC!
By rebturtle on 4/17/2008 11:16:27 PM , Rating: 2
Wohoo! My SETI scores are gonna go through the roof!!




6 core cpu
By adam92682 on 4/18/2008 1:09:16 AM , Rating: 2
all the Intel advertisements will say "Hard core sex core."




If they drop the L3 cache...
By Amiga500 on 4/18/2008 6:24:26 AM , Rating: 2
Like they plan to do for the Propus quad core (variant of deneb) - would they be able to put 18 cores on the one die?




HT2.0 vs. HT3.0 support
By dess on 4/22/2008 9:02:33 AM , Rating: 2
quote:
Motherboard manufacturers claim Shanghai and its many-core derivatives will be backwards compatible with existing Socket 1207 motherboards. However, processor-to-processor communication will downgrade to lower HyperTransport frequencies on these older motherboards.

That will certainly downgrade (to HT2.0) on older motherboards is the communication with the chipset. But the chipset has nothing to do with inter-processor communication -- that in itself does not prevent HT3.0 between processors.

This is most probably the reason of that:
quote:
Original roadmaps anticipated that HT3.0 would be used for socket-to-socket communication, but also for communication to the Southbridge controllers. Motherboard manufacturers have confirmed that this is no longer the case, and that HT3.0 will only be used for inter-CPU communication.

And then, using chipsets that support HT3.0:
quote:
The newest 1207+ motherboards will officially support the HyperTransport 3.0 frequencies.

I would recognize it this way: 1207 motherboards will have HT2.0 between the chipset and CPU's, while probably HT3.0 between CPU's. And later the 1207+ motherboards will have full HT3.0 support for both.




A novel idea
By Procurion on 5/7/2008 10:09:17 AM , Rating: 2
What about caling it the "twelve-core processor". I am getting head-hurt.




Too little too late AMD
By Reclaimer77 on 4/17/2008 7:19:46 PM , Rating: 1
Maybe for servers. But your not ready for prime time in my desktop AMD. Lets see some benchmarks first.




By ImSpartacus on 4/17/2008 8:29:25 PM , Rating: 1
I like how AMD is changing its song and finally planning on making some duct tape CPU's. But of course, they aren't real CPU's, right? Ha!

Regardless, a 12 core CPU will be sweet. Too bad software won't use it to its full potential.




By chiadog on 4/18/2008 7:43:29 AM , Rating: 1
Double whopper versus double cheese burger.
So, who is going to make the bacon ultimate cheese burger? :o~~

As long as the performance is there, does it matter how many dies are on the package? Performance > Marketing.


And the last sentence came from..... ?
By kilkennycat on 4/17/08, Rating: -1
RE: And the last sentence came from..... ?
By neo64 on 4/18/2008 2:52:11 AM , Rating: 2
By johnsonx on 4/19/2008 9:52:05 PM , Rating: 2
But that is just reposted from Fudzilla. Reputable goes right out the window then.


Is it just me, or...
By Motoman on 4/18/08, Rating: -1
RE: Is it just me, or...
By prenox on 4/18/2008 3:54:13 PM , Rating: 2
Having to re-encode just about every video for my PSP I do use all 4 cores. I guess if I had an Ipod or a Zune I would have to re-encode the videos for them too.

I think that are alot of 'normal' people out there that do transfer videos to their players.


RE: Is it just me, or...
By Motoman on 4/18/2008 6:36:52 PM , Rating: 1
I disagree. I know large numbers of people with PSPs...and none of them use them for watching videos at all - except for a few movies they were gullible enough to buy on UMD.

Re-encoding video formats is most definitely not a typical user activity.


RE: Is it just me, or...
By just4U on 4/21/2008 3:09:50 PM , Rating: 2
What about super computers that have tons of cores? Basically that's all their doing turning our desktops into miniture versions of that. I am sure that they make use of such machines. Hell, I'd love to see solitaire on one!

Ok kidding about that last part... but even so!


RE: Is it just me, or...
By jRaskell on 4/24/2008 2:28:13 PM , Rating: 2
This recent trend of increasing the total number of cores is only going to continue, and it will soon require the software development world to step back and completely rethink their entire software design philosophies and develop frameworks that scale very well with parallel processing architectures.

forget about "multi-tasking". It's entirely possible to take advantage of multiple cores for a single task. Similar to the lines of 1 man can dig a ditch in 8 hours, 2 can dig it in 4 hours, and 8 can dig it in an hour. One task, but multiple resources working to complete it quicker. The problem is just that current Operating System and many Application architectures aren't designed to automatically take advantage of multiple cores to complete a single task.

When this is done properly , then processing intensive applications suchs as games will automatically scale rather nicely as the number of available cores increases. It's a chicken and egg scenario though. Software Developers aren't going to allocate large amounts of resources to completely redesigning their application frameworks until there's a large number of computers in their consumers hands that will take advantage of it.

The graphics market made this shift several years ago, and the general purpose CPU market is still in it's transition phases (and admittedly will be a more complicated and time-consuming shift to make).

quote:
No normal person multi-tasks that much, and no applications that normal people use can take advantage of 4 cores.


The first part of your statement there is complete correct. The second half, only partially correct today, but at some point in the future, it will be completely false.


By KristopherKubicki (blog) on 4/17/2008 7:33:21 PM , Rating: 5
Meet every webserver app you've ever heard of :)


By onwisconsin on 4/17/2008 9:59:44 PM , Rating: 5
Sorry if I misunderstood your post, but 32bit (x86) OSes will work fine on 64bit CPUs. Also, 32bit programs will work on a 64bit OS, but 32bit DRIVERS won't.


By Jellodyne on 4/18/2008 9:45:54 AM , Rating: 2
You're technically right, but the performance hit is very slight. But there are also times where the address space from a 64 bit Windows OS will cause a 32 bit program to run faster.

See, in most cases under a 32 bit OS, your program is limited to 2GB ram. There's a flag you can set, if the program is written for it, to extend that to 3GB. However, under a 64 bit Windows OS, each 32 bit app can have 4GB of unfettered address space. If you've got more than 2GB of RAM and a memory 32 bit hungry app, the odds are good you'll run faster on 64 bit OS despite the very minor hit from the 32-to-64 translation layer.


By darkpaw on 4/18/2008 9:55:42 AM , Rating: 2
On a fully native 64bit processor like Itanium, you have to emulate the whole 32bit architecture, which comes with a huge performance hit.

On the x64 processors, the entire 32bit architecture is still present. Nothing is actually emulated. A software interface to the 32 bit APIs is provided (Windows on Windows), which is more like virtualization then emulation. This does not cause a significant performance hit.


By freeagle on 4/18/2008 6:59:50 AM , Rating: 2
Where do get this information people?

Programming does not get harder when trying to utilize more than 2 cores. The problem is, that people are trying to do things in parallel, that should not be done in that way. When you need a lot of data sharing between threads, there is a good chance that you are going the wrong way. Example from game engines could be rendering the scene, calculating physical interactions when you sort your objects into disjunct groups, etc.

Oscalcido, the whole point of having extra cores is to compute appropriate tasks faster, like the rendering mentioned above, or archiving, decoding... Another point is that the whole system gets more responsive, especially when you are used to running multiple applications at once.

Freeagle


By Aikouka on 4/18/2008 9:10:35 AM , Rating: 2
I'm not sure what you mean, because programming can inherently become more complex as you raise the number of threads. Now, the reason why I boldfaced "can" is because not all tasks are hard to break down. A simple computation can typically be broken down into parts and combined later on, but that's just a simple computation. But there are tasks that are harder to break down. I remember taking a parallel processing course back in college and one of the focuses early on in the class was learning how to break down a task into parts that could be split amongst the cluster of machines. Some were pretty simple yet even tasks that looked easy to break down sometimes proved to be a bit tricky.


By freeagle on 4/18/2008 11:45:52 AM , Rating: 2
What I mean is that people tend to "force" parallel programming where it's not appropriate, that's why it seem to be getting harder. If you know how to break a task into parts, that can run in parallel, then you know whats the maximum number of threads you can utilize without getting into synchronization nightmares.


By Aikouka on 4/18/2008 12:25:05 PM , Rating: 2
Ahh, sorry then. I must've misunderstood your post. I agree with what you said there about how people tend to push multi-threading or think multi-threading is apt for any application.


By Sulphademus on 4/18/2008 9:16:11 AM , Rating: 2
Multi-Apps will be the biggest one ATM.
Some things just arent well designed for SMP. But given that my Vista machines are running 50 to 80 processes, spreading the load of single threaded processes helps alot. However these types of things will only get faster via more MHz or more efficient core architecture.

The advantage for programs that do work well in parallelism should be huge this next round.


By freeagle on 4/18/2008 11:49:37 AM , Rating: 2
Some applications can run faster only with increased performance of single core. That's because the way they execute is extremely hard or purely impossible to do in parallel


By boogle on 4/18/2008 9:38:27 AM , Rating: 2
I for one hope you're not a programmer working on multithreaded applications. If you're going to randomly spawn off a load of threads because the class looks like it can work on its own, you're going to end up with a scheduling & syncing nightmare. And as anyone whose run into lots of syncing knows - performance goes to below that of a single threaded app.


By freeagle on 4/18/2008 12:01:14 PM , Rating: 2
quote:
If you're going to randomly spawn off a load of threads because the class looks like it can work on its own, you're going to end up with a scheduling & syncing nightmare


I have absolutely no idea how you deduced this from my post.

quote:
scheduling .... nightmare

Number of threads in your application has nearly zero effect on the performance of system scheduler. What can really slow your application below the execution of a single threaded app is when you dynamically create and destroy threads. An example could be matrix multiplication. But I'm sure you, as someone that a lot about parallel programming, have heard of something called thread pool, or futures concept


By inighthawki on 4/17/2008 7:42:42 PM , Rating: 3
I believe the majority of the hardships for making a program multi-core compatible is simply utilizing multiple process threads in the application, allowing it to work on more than one thing at a time.


By osalcido on 4/18/2008 5:41:45 AM , Rating: 1
You really think there are software programmers out there making core-total tailored software?

I havent heard of this


By kkwst2 on 4/17/2008 10:09:42 PM , Rating: 3
Well, it depends on the application. Fluent (computational fluid modeling software) scales pretty well to well over 100 cores. Most modern computer modeling software packages do.

It costs me around $700 per core to assemble a high-end cluster right now using dual proc nodes. I had to use Xeons because the Quad Opterons just weren't available.

Understand that the Opterons still scale better than Xeons on Fluent and many other HPC applications, I guess because of their better FPU performance. I would have used Opterons if I could have.

With 6 cores per processor and thus 12 cores per node, I could probably cut the cost to around $500 per core. The question of course is whether the bus is going to be fast enough to utilize the increased cores effectively. Even at 8 cores per node and the newer 1600 MHz FSB, the scaling appears to be somewhat limited by the FSB.

IF HT3.0 helps solve this and allows better scaling with increased core density in my cluster, this could be huge for my application.


By tinyfusion on 4/18/2008 12:13:54 AM , Rating: 2
By the time AMD starts shipping its 6-core processors, Intel has sold millions of Nehalem processors which are free of the limitations imposed by aging FSB, including those 8-core CPUs.


By spluurfg on 4/18/2008 2:41:33 AM , Rating: 3
To my knowledge, the Nehalem uses an on-die memory controller, which was first implemented on the Opteron... Also, both will need to have a bus to the main memory, which can still serve as a bottleneck...


By spluurfg on 4/18/2008 7:25:08 AM , Rating: 2
I am guessing that you are suggesting that some other processor implemented an on-die memory controller first, though I can't be sure just from your comment -- perhaps you could enlighten me?

At any rate, my point was that, to my knowledge, the Nahelem's use of an on-die memory controller will not transcend bandwidth limitations between the main system memory and the processor, and that the Opteron's implementation of an on-die memory controller was the first in this market (x86 multi-socket server processor).

Though the caveat from my original reply was 'to my knowledge', so I'm welcome to any correction here.


By josmala on 4/18/2008 8:06:44 AM , Rating: 2
Ondie memory controllers?
EV7, 80128, Timna.


By josmala on 4/18/2008 8:08:01 AM , Rating: 2
Typo I meant. 80186 The processor intel made between the Original 8086 And 80286.


By Amiga500 on 4/18/2008 7:32:28 AM , Rating: 2
Even at 8 cores per node and the newer 1600 MHz FSB, the scaling appears to be somewhat limited by the FSB.

Are you talking about Intel Xeons there?

On the Xeons the Speedup of going from 4 to 8 cores is minimal in CFX (it should be the same in fluent). I guess you'd already know about actually allocating your processors to reduce cache flushing and get the best out of the architecture.

For 2 thread jobs, allocate to separate sockets to take advantage of both shared cache and memory bandwidth - for instance use CPUs 0 and 4.

For 4 thread jobs, to take advantage of shared cache allocate to CPUs 0, 2, 4 and 6 (or variant).

You'll see a significant speedup doing that - over 30% in some cases.


By HighWing on 4/17/2008 11:20:22 PM , Rating: 2
I think at some point it's how the OS supports it that matters more. It's my understanding that even if the program is not written to take advantage of a multi-core system, the OS should be able to interpret this and still either split up the processing as needed or be managing background apps with the other cores.


By mindless1 on 4/18/2008 7:02:52 AM , Rating: 2
No, OS can't split up the load anymore that it was already multi-threaded. Yes background apps could and would run on othe rcores but it's often fairly irrelevant as background apps often take up insignificant CPU time.


By Locutus465 on 4/18/2008 12:12:23 AM , Rating: 2
All depends on the application but I've written code my self (in my college days) which could scale well past 12 cores. Of coures not every work load will benifit from this, my chosen workload just happend to be very condusive to scaling well. I think game developers will have the hardest time utilizing this many cores.


By DarkElfa on 4/18/2008 10:39:33 AM , Rating: 2
I agree, I barely have anything other than rendering that uses the 4 cores I have now, why do I need 12?


By Locutus465 on 4/21/2008 9:54:14 AM , Rating: 2
Oh, I wasn't saying "forget more cores"! Qutie the contrary I think they should bring them on!! In fact I think I might start dabling in threaded programming again (even though I concentrate on web now) just for fun. I'm just saying the beneifit will vary by app and some genera's of applications (i.e. games) may take a little while longer to feel the full effect.


"The Space Elevator will be built about 50 years after everyone stops laughing" -- Sir Arthur C. Clarke

Related Articles
AMD Finally Ships "B3" Opterons
March 12, 2008, 1:13 PM
Gearing Up For AMD Revision G
May 24, 2006, 5:35 AM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki