backtop


Print 91 comment(s) - last by Procurion.. on May 7 at 10:09 AM

AMD engineers reveal details about the company's upcoming 45nm processor roadmap, including plans for 12-core processors

"Shanghai! Shanghai!" the reporters cry during the AMD's financial analyst day today. Despite the fact that the company will lay off nearly 5% of its work force this week, followed by another 5% next month, most employees interviewed by DailyTech continue to convey an optimistic outlook.

The next major milestone for the CPU engineers comes late this year, with the debut of 45nm Shanghai. Shanghai, for all intents and purposes, is nearly identical to the B3 stepping of Socket 1207 Opteron (Barcelona) shipping today.  However, where as Barcelona had its HyperTransport 3.0 clock generator fused off, Shanghai will once again attempt to get HT3.0 right.

Original roadmaps anticipated that HT3.0 would be used for socket-to-socket communication, but also for communication to the Southbridge controllers. Motherboard manufacturers have confirmed that this is no longer the case, and that HT3.0 will only be used for inter-CPU communication.

"Don't be disappointed, AMD is making up for it," hints one engineer.  Further conversations revealed that inter-CPU communication is going to be a big deal with the 45nm refresh.  The first breadcrumb comes with a new "native six-core" Shanghai derivative, currently codenamed Istanbul.  This processor is clearly targeted at Intel's recently announced six-core, 45nm Dunnington processor.

But sextuple-core processors have been done, or at least we'll see the first ones this year.  The real neat stuff comes a few months after, where AMD will finally ditch the "native-core" rhetoric.  Two separate reports sent to DailyTech from AMD partners indicate that Shanghai and its derivatives will also get twin-die per package treatment.  

AMD planned twin-die configurations as far back as the K8 architecture, though abandoned those efforts.  The company never explained why those processors were nixed, but just weeks later "native quad-core" became a major marketing campaign for AMD in anticipation of Barcelona.

A twin-die Istanbul processor could enable 12 cores in a single package. Each of these cores will communicate to each other via the now-enabled HT3.0 interconnect on the processor.  

The rabbit hole gets deeper.  Since each of these processors will contain a dual-channel memory controller, a single-core can emulate quad-channel memory functions by accessing the other dual-channel memory controller on the same socket.  This move is likely a preemptive strike against Intel's Nehalem tri-channel memory controller.
 
Motherboard manufacturers claim Shanghai and its many-core derivatives will be backwards compatible with existing Socket 1207 motherboards.  However, processor-to-processor communication will downgrade to lower HyperTransport frequencies on these older motherboards. The newest 1207+ motherboards will officially support the HyperTransport 3.0 frequencies.

Shanghai is currently taped out and running Windows at AMD.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

By MustangMike on 4/17/2008 7:40:26 PM , Rating: -1
Been waiting on buying a quad core 64bit cpu because lack of 64 bit support on most programs not to mention games.
Even though a software program can read quad core cpus doesn't mean they can use 100% of all 4 cores.


By onwisconsin on 4/17/2008 9:59:44 PM , Rating: 5
Sorry if I misunderstood your post, but 32bit (x86) OSes will work fine on 64bit CPUs. Also, 32bit programs will work on a 64bit OS, but 32bit DRIVERS won't.


By Jellodyne on 4/18/2008 9:45:54 AM , Rating: 2
You're technically right, but the performance hit is very slight. But there are also times where the address space from a 64 bit Windows OS will cause a 32 bit program to run faster.

See, in most cases under a 32 bit OS, your program is limited to 2GB ram. There's a flag you can set, if the program is written for it, to extend that to 3GB. However, under a 64 bit Windows OS, each 32 bit app can have 4GB of unfettered address space. If you've got more than 2GB of RAM and a memory 32 bit hungry app, the odds are good you'll run faster on 64 bit OS despite the very minor hit from the 32-to-64 translation layer.


By darkpaw on 4/18/2008 9:55:42 AM , Rating: 2
On a fully native 64bit processor like Itanium, you have to emulate the whole 32bit architecture, which comes with a huge performance hit.

On the x64 processors, the entire 32bit architecture is still present. Nothing is actually emulated. A software interface to the 32 bit APIs is provided (Windows on Windows), which is more like virtualization then emulation. This does not cause a significant performance hit.


By freeagle on 4/18/2008 6:59:50 AM , Rating: 2
Where do get this information people?

Programming does not get harder when trying to utilize more than 2 cores. The problem is, that people are trying to do things in parallel, that should not be done in that way. When you need a lot of data sharing between threads, there is a good chance that you are going the wrong way. Example from game engines could be rendering the scene, calculating physical interactions when you sort your objects into disjunct groups, etc.

Oscalcido, the whole point of having extra cores is to compute appropriate tasks faster, like the rendering mentioned above, or archiving, decoding... Another point is that the whole system gets more responsive, especially when you are used to running multiple applications at once.

Freeagle


By Aikouka on 4/18/2008 9:10:35 AM , Rating: 2
I'm not sure what you mean, because programming can inherently become more complex as you raise the number of threads. Now, the reason why I boldfaced "can" is because not all tasks are hard to break down. A simple computation can typically be broken down into parts and combined later on, but that's just a simple computation. But there are tasks that are harder to break down. I remember taking a parallel processing course back in college and one of the focuses early on in the class was learning how to break down a task into parts that could be split amongst the cluster of machines. Some were pretty simple yet even tasks that looked easy to break down sometimes proved to be a bit tricky.


By freeagle on 4/18/2008 11:45:52 AM , Rating: 2
What I mean is that people tend to "force" parallel programming where it's not appropriate, that's why it seem to be getting harder. If you know how to break a task into parts, that can run in parallel, then you know whats the maximum number of threads you can utilize without getting into synchronization nightmares.


By Aikouka on 4/18/2008 12:25:05 PM , Rating: 2
Ahh, sorry then. I must've misunderstood your post. I agree with what you said there about how people tend to push multi-threading or think multi-threading is apt for any application.


By Sulphademus on 4/18/2008 9:16:11 AM , Rating: 2
Multi-Apps will be the biggest one ATM.
Some things just arent well designed for SMP. But given that my Vista machines are running 50 to 80 processes, spreading the load of single threaded processes helps alot. However these types of things will only get faster via more MHz or more efficient core architecture.

The advantage for programs that do work well in parallelism should be huge this next round.


By freeagle on 4/18/2008 11:49:37 AM , Rating: 2
Some applications can run faster only with increased performance of single core. That's because the way they execute is extremely hard or purely impossible to do in parallel


By boogle on 4/18/2008 9:38:27 AM , Rating: 2
I for one hope you're not a programmer working on multithreaded applications. If you're going to randomly spawn off a load of threads because the class looks like it can work on its own, you're going to end up with a scheduling & syncing nightmare. And as anyone whose run into lots of syncing knows - performance goes to below that of a single threaded app.


By freeagle on 4/18/2008 12:01:14 PM , Rating: 2
quote:
If you're going to randomly spawn off a load of threads because the class looks like it can work on its own, you're going to end up with a scheduling & syncing nightmare


I have absolutely no idea how you deduced this from my post.

quote:
scheduling .... nightmare

Number of threads in your application has nearly zero effect on the performance of system scheduler. What can really slow your application below the execution of a single threaded app is when you dynamically create and destroy threads. An example could be matrix multiplication. But I'm sure you, as someone that a lot about parallel programming, have heard of something called thread pool, or futures concept


"Spreading the rumors, it's very easy because the people who write about Apple want that story, and you can claim its credible because you spoke to someone at Apple." -- Investment guru Jim Cramer

Related Articles
AMD Finally Ships "B3" Opterons
March 12, 2008, 1:13 PM
Gearing Up For AMD Revision G
May 24, 2006, 5:35 AM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki