Print 28 comment(s) - last by Reclaimer77.. on Nov 16 at 1:07 AM

(Click to enlarge)

(Click to enlarge)
500,000 cores already shipped since September

AMD launched its much anticipated Bulldozer architecture for the consumer market last month, but many were disappointed at the performance numbers. Now the company has officially launched new processors using the same architecture for the server and workstation markets, but things have changed significantly.
The key difference is in the software used to process instructions. The consumer side is reliant upon Windows 7 and earlier operating systems, which are unaware of the shared nature of the Bulldozer architecture. Resource sharing is inefficient at best, and the full possibilities of higher Turbo Core frequencies are missed.
AMD has worked to ensure optimization and/or support on many commonly used server operating systems. Linux 2.6.37, Windows Server 2008 R2 SP1, Xen 41, Ubuntu 11.04, and VMware vSphere 5.0 already have OS hypervisor support for Bulldozer, while others such as RedHat Enterprise Linux 6.2 and Windows 8 Server are currently in development.
AMD is specifically targeting the High Performance Computing (HPC) segment, with over 500,000 Bulldozer cores already shipped to this market since September. The AVX, FMA4, and XOP instructions require software to be recompiled in order to take advantage of their performance enhancements. Java 7 was mentioned as a program that was being worked on.

The Opteron 6200 series was formerly codenamed Interlagos. It is scalable to 4 sockets supporting 16 Bulldozer cores each. The fastest model is the 6282 SE at 2.6Ghz, with a maximum Turbo Core frequency of 3.3GHz and a TDP of 140W. The Opteron 4200 series was formerly codenamed Valencia. It is the most similar to the FX series (Zambezi) launched in October, but it will support up to 2 sockets with 8 cores each.

Both series support DDR3-1600 memory natively, but there will be official support for DDR3-1866 through specific OEMs. Opteron 6200 CPUs have quad memory channels, while the Opteron 4200 chips have dual channels. 1.35v low voltage memory and 1.25v ultra-low voltage memory is also supported, as are Load Reduced DIMMs (LRDIMMs).

The L1 cache is arranged as 16KB data per core and 64KB instruction per module, while the L2 cache is 1MB per core. Opteron 6200s have a shared 16MB of L3 cache per socket, while Opteron 4200s only have a shared 8MB per socket.

In order to speed time to market and lower validation costs, AMD has designed its new Opterons to function on its previous platforms using the G34 and C32 sockets. The company believes that its lower total platform costs over Intel’s Xeon platforms impart a significant advantage. For example, the AMD Opteron 6276 will ship at the same price as the Xeon E5640, but will outperform it by 89%.

Cloud computing requires high throughput, scalability, density, and power efficiency. AMD thinks that it can gain significant market share by claiming the lowest x86 watts/core in the industry at 5.3W for Interlagos and 4.375W for Valencia. The new C6 power state reduces power consumption at idle by up to 46% over the previous generation by enabling core power gating  When a core is halted, its context is exported to system memory and voltage is removed from the core.   
Intel will be launching new server and workstation products based on the Sandy Bridge architecture next year, but AMD also has plans for the future with its Piledriver architecture. Sepang will use the C2012 socket and replace the Opteron 6200 series, while Terramar will use the G2012 socket. Both new platforms will support PCIe 3.0.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: Very encouraging
By geddarkstorm on 11/14/2011 2:25:57 PM , Rating: 3
Bulldozer has serious potential, there's just something wrong with this first version. Something isn't gelling quite right, and likely it has to do with the very deep pipelines verses the branch predictor. We should see an much improved version with Trinity here soon. They just had to release bulldozer eventually, even with its rough edges.

It'll be fun seeing how the technology develops, as AMD starts to polish it up.

RE: Very encouraging
By Da W on 11/14/2011 4:11:48 PM , Rating: 3
You won't see much improved branch predictors. It was Intel's flaw in the pentium 4. It is very hard engineering. AMD simply doesn't have the ressources.
Only solution is to shorten the pipelines. Then faster L3 cache.
Yet you are still stucked with a 2B transistor monster (no GPU) that barely match up with Intel's mainstream 970M transistor piece (GPU included). I don't see what they can do with that.

RE: Very encouraging
By StevoLincolnite on 11/14/2011 7:38:18 PM , Rating: 3
Only solution is to shorten the pipelines.

Or... DO what Intel did when Netburst failed. That is to go to a re-worked P6 architecture and heavily revise it to bring out a fast and efficient CPU. Aka. Core 2.

Phenom 2 x6 is faster than Zambezi most of the time, has better single threaded performance and only looses out in extremely multi-threaded scenarios. A Thuban at 4.2ghz will be faster than Zambezi at 4.8ghz, especially in gaming.

Personally... I wish AMD grabbed a Thuban, decreased the cache latency, threw more cache at it, boosted the NB clocks and whacked on a few more cores.
Not only would it be faster (Phenom's scale very well with increases in the NB clock.)
But the Die-Size and transistor count would be substantially lower reducing production costs.

Not only would production costs be lower, but RnD would be lower as it's based on a previous design.
In other words, AMD could have targeted an 8 core Phenom "3" at a budget price as it would have been an all-round more economical chip to launch.

I was running a Phenom 2 x6 1090T @ 4ghz that I paid $400 Aussie dollars for when it was first launched. (Back when it wasn't going up against Sandy Bridge).
It was great value then.

I just recently picked up a Phenom 2 x6 1035T to replace my 1090T with. I'm able to shave off 0.15 volts while maintaining the same clocks.

I'll probably be happy with this chip for another 12 months, as a gamer running Dual Radeon 6950 2gb cards and Eyefinity with a resolution of 5760x1080, the CPU plays a less important role when every single game I'm GPU limited anyway.

RE: Very encouraging
By someguy123 on 11/14/2011 9:16:38 PM , Rating: 2
Switching over to yonah-based from willamette shortened the pipelines, which is one of the reasons for the substantially better per clock performance.

If AMD did nothing but shrink thuban and added more cores they would've had a superior product.

"Google fired a shot heard 'round the world, and now a second American company has answered the call to defend the rights of the Chinese people." -- Rep. Christopher H. Smith (R-N.J.)

Most Popular ArticlesAre you ready for this ? HyperDrive Aircraft
September 24, 2016, 9:29 AM
Leaked – Samsung S8 is a Dream and a Dream 2
September 25, 2016, 8:00 AM
Yahoo Hacked - Change Your Passwords and Security Info ASAP!
September 23, 2016, 5:45 AM
A is for Apples
September 23, 2016, 5:32 AM
Walmart may get "Robot Shopping Carts?"
September 17, 2016, 6:01 AM

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki