Print 40 comment(s) - last by MAIA.. on Jun 17 at 9:10 PM

FireStream 9250 delivers eight gigaflops-per-watt performance

AMD announced today that its latest stream processor -- the FireStream 9250 -- offers record setting performance. AMD says that the FireStream 9250 is optimized for high-performance computing, mainstream and consumer applications.

AMD says that its FireStream 9250 has broken the one teraflop barrier for single precision performance. The card itself is a single-slot design and consumes less than 150W of power. The 9250 has performance per watt of up to eight gigaflops per watt.

The FireStream 9250 promises much faster data processing for critical workloads like financial analysis or seismic processing than with the CPU alone. According to AMD, developers have reported up to a 55x performance increase on financial analysis as compared to processing on the CPU alone when using its products.

The 9250 has second-generation double-precision floating point hardware that delivers over 200 gigaflops. The 9250 has built on the capabilities of the FireStream 9170, which according to AMD was the industries first GP-GPU. Memory for the FireStream 9250 is 1GB of GDDR3. AMD also provides an AMD Stream SDK to help developers take advantage of the processing power of its FireStream products.

 “An open industry standard programming specification will help drive broad-based support for stream computing technology in mainstream applications," said Rick Bergman, senior vice president and general manager, Graphics Product Group, AMD. "We believe that OpenCL is a step in the right direction and we fully support this effort. AMD intends to ensure that the AMD Stream SDK rapidly evolves to comply with open industry standards as they emerge."

The AMD FireStream 9250 will be available in Q3 2008 for $999. The AMD FireStream 9170 retails for $1,999. AMD’s main rival, NVIDIA also has its own stream processing initiative with the product line called Tesla.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

Supercomputer on a budget?
By Klober on 6/16/2008 4:04:25 PM , Rating: 2
The researchers used PetaVision to set a processing record with Roadrunner, spinning up to an astonishing 1.144 petaflop/s.

So, pick up ~1,500-2,000 of these things, network them together (with the required hardware for the PCs to put them in of course), and have almost equivalent performance to Roadrunner? Or would scaling deficiencies push this back to a much higher number of FireStream cards? Could be an interesting experiment once we've gotten more efficient and experienced at programming for OpenCL.

RE: Supercomputer on a budget?
By Zoomer on 6/16/2008 7:22:55 PM , Rating: 4
In other news, AMD achieves Q2 profitability with Klober's $4 M purchase.

Good luck finding a bank to, erm, bankroll you on that.

RE: Supercomputer on a budget?
By Goty on 6/16/2008 8:39:41 PM , Rating: 2
$4 Million isn't a bad price for a supercomputer. Hell, one of my old professors put in a $1 Million grant proposal to build a cluster out of NVIDIA Tesla D870s just this last year to run galactic dynamics simulations.

RE: Supercomputer on a budget?
By emboss on 6/16/2008 8:42:11 PM , Rating: 2
Not quite ... Roadrunner does 1.4 pflops DOUBLE precision. This card does 1 tflops SINGLE precision. It's double precision throughput is about 10% of its single precision throughput (it combines multiple single-precision units together to act as a single double precision unit, which NV has finally done with the G200) so you'd need probably 10,000-plus cards. Then you need to take into account that GPU-based clusters don't scale well because of latency: GPUs have about a 300 us latency just to get out of the GPU. In Roadrunner, it's 1.5 us, and that's already hurting them in some situations.

GPUs are great for HPC, as long as your computation doesn't have to leave the GPU (or is VERY latency tolerant). They're still very limited in which problems they can solve.

I'm in the process of setting up a home cluster, and at the moment I have four nodes, each with a Phenom 9550, hooked together with a massively overkill Quadrics (QSNet-1) switch. So that's about 70 gflops theoretical single-precision for the cluster. I've also got my main development machine, a Q6600 with an 8800 GTX. So 500 gflops and change single precision there. Many of the things I write run BETTER on the cluster with only a 7th of the theoretical grunt, and not for lack of effort in trying to get the GPU to perform.

I'm all for doing HPC on a GPU, but the reality is the required interconnects are quite different. Graphics doesn't care about latency, and it makes sense to trade off latency to increase throughput. HPC is the opposite. And graphics performance is much more important design target for GPUS than HPC ...

"Well, there may be a reason why they call them 'Mac' trucks! Windows machines will not be trucks." -- Microsoft CEO Steve Ballmer

Related Articles

Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki