Print 122 comment(s) - last by VooDooAddict.. on Jan 9 at 3:21 PM

Details of AMD's next generation Radeon hit the web

Newly created site Level 505 has leaked benchmarks and specifications of AMD’s upcoming ATI R600 graphics processor. The upcoming graphics processor is expected to launch in January 2007 with an expected revision arriving in March 2007. These early specifications and launch dates line up with what DailyTech has already published and are present on ATI internal roadmaps as of workweek 49.

Preliminary specifications from Level 505 of the ATI R600 are as follows:
  • 64 4-Way SIMD Unified Shaders, 128 Shader Operations/Cycle
  • 32 TMUs, 16 ROPs
  • 512 bit Memory Controller, full 32 bit per chip connection
  • GDDR3 at 900 MHz clock speed (January)
  • GDDR4 at 1.1 GHz clock speed (March, revised edition)
  • Total bandwidth 115 GB/s on GDDR3
  • Total bandwidth 140 GB/s on GDDR4
  • Consumer memory support 1024 MB
  • DX10 full compatibility with draft DX10.1 vendor-specific cap removal (unified programming)
  • 32FP [sic] internal processing
  • Hardware support for GPU clustering (any x^2 [sic] number, not limited to Dual or Quad-GPU)
  • Hardware DVI-HDCP support (High Definition Copy Protocol)
  • Hardware Quad-DVI output support (Limited to workstation editions)
  • 230W TDP PCI-SIG compliant
This time around it appears AMD is going for a different approach by equipping the ATI R600 with less unified shaders than NVIDIA’s recently launched GeForce 8800 GTX. However, the unified shaders found on the ATI R600 can complete more shader operations per clock cycle.

ATI's interal guidance states the R600 will have 320 stream processors at launch; 64 4-way unified shaders only accounts for 256 of these stream processors.

Level505 claims AMD is expected to equip the ATI R600 with GDDR3 and GDDR4 memory with the GDDR3 endowed model launching in January. Memory clocks have been set at 900 MHz for GDDR3 models and 1.1 GHz for GDDR4 models.  As recent as two weeks ago, ATI roadmaps had said this GDDR3 launch was canceled.  These same roadmaps claim the production date for R600 is February 2007, which would be after a January 22nd launch.

Memory bandwidth of the ATI R600 is significantly higher than NVIDIA’s GeForce 8800-series. Total memory bandwidth varies from 115GB/s on GDDR3 equipped models to 140GB/s on GDDR4 equipped models.

Other notable hardware features include hardware support for quad DVI outputs, but utilizing all four outputs are limited to FireGL workstation edition cards.

There’s also integrated support for multi-GPU clustering technologies such as CrossFire too. The implementation on the ATI R600 allows any amount ofATI R600 GPUs to operate together in powers of two. Expect multi-GPU configurations with greater than two GPUs to only be available for the workstation markets though.

The published results are very promising with AMD’s ATI R600 beating out NVIDIA’s GeForce 8800 GTX in most benchmarks. The performance delta varies from 8% up to 42% depending on the game benchmark.

When DailyTech contacted the site owner to get verification of the benchmarks, the owner replied that the benchmark screenshots could not be published due to origin-specific markers that would trace the card back to its source -- the author mentioned the card is part of the Microsoft Vista driver certification program.

If Level505's comments seem a little too pro-ATI, don't be too surprised.  When asked if the site was affiliated in any way to ATI or AMD, the owner replied to DailyTech with the statement that "two staff members of ours are directly affiliated with AMD's business [development] division."

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: Different Strategy?
By MAIA on 1/8/2007 8:25:06 AM , Rating: 2
As can be seen from the diagram above, the primary difference between R520 and R580 lies in the number of pixel shader units available to each of them, with R520 having 16 pixel shader cores, while R580 triples that number. As is the case with all of the R5xx series, each pixel shader core consists of:

* ALU 1
o 1 Vec3 ADD + Input Modifier
o 1 Scalar ADD + Input Modifier
* ALU 2
o 1 Scalar ADD/MUL/MADD
* Branch Execution Unit
o 1 Flow Control Instruction

The net result is that R580 contains 48 Vector MADD ALUs and 48 Vector ADD ALUs operating over 48 pixels (fragments) in parallel. Along with the pixel shader cores, the pixel shader register array on R580 has also tripled in size in relation to R520, so that R580 is still capable of having the same number of batches in flight.

The R580 diagram above isn't entirely accurate, as it indicates that there are 12 different dispatch processors for R580, where R520 has 4 - this is, in fact, not the case. As with RV530, R580's "quads" (or lowest element of hardware scalability in the pixel pipeline) have increased such that they can handle three quads in the pixel shader core, but they do so by operating on the same thread. The net result is that R580 still contains 4 different pixel processing cores, ergo only 4 dispatch processors, with each core handling up to 128 batches/"threads". As a result, R580 is still handling a total of 512 batches/"threads" as R520 does. Each of the batches in R580 consist of a maximum of three times as many pixels (48), so that they can be mapped across the 12 pixel shader processors that exist in each of the 4 cores, with the net result being that R580 can have 24,576 pixels in flight at once. Note that because this is still based around 4 processing cores, the lowest level of shader granularity is likely to be 12 pipelines, so if ATI releases parts that's had failures within the pixel pipeline element of the die the next configuration down would likely be 12 textures, 36 shader pipelines, and 12 ROPs.

ATI have also increased the Hierarchical Z buffer size on R580, which can now store up to 50% more pixel information than R520 can, allowing for better performance at even higher resolutions. However, other than that most of the other elements stay the same, shader wise, with R580 still having 8 vertex shaders, single Z/Stencil rates (unlike RV530) and still continuing with 16 texture units serving all 48 shader processors.

RE: Different Strategy?
By MAIA on 1/8/2007 8:27:34 AM , Rating: 2
Fetch4 is actually not just implemented within R580 but RV530 and RV515 as well, although curiously not R520. Because of the relatively low shader capabilities of R520, in relation to R580, it's more likely to be shader bound on operations such as these anyway, so the increase in the sample time is less likely to be an issue. With R580 though, as it has such a high math capability in relation to its number of texture samplers it's more important that its texture utilisation is optimised, so wasting 3 cycles on single precision texture formats is going to bottleneck it more.


Can be read on the next page

"If a man really wants to make a million dollars, the best way would be to start his own religion." -- Scientology founder L. Ron. Hubbard

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki