Print 122 comment(s) - last by VooDooAddict.. on Jan 9 at 3:21 PM

Details of AMD's next generation Radeon hit the web

Newly created site Level 505 has leaked benchmarks and specifications of AMD’s upcoming ATI R600 graphics processor. The upcoming graphics processor is expected to launch in January 2007 with an expected revision arriving in March 2007. These early specifications and launch dates line up with what DailyTech has already published and are present on ATI internal roadmaps as of workweek 49.

Preliminary specifications from Level 505 of the ATI R600 are as follows:
  • 64 4-Way SIMD Unified Shaders, 128 Shader Operations/Cycle
  • 32 TMUs, 16 ROPs
  • 512 bit Memory Controller, full 32 bit per chip connection
  • GDDR3 at 900 MHz clock speed (January)
  • GDDR4 at 1.1 GHz clock speed (March, revised edition)
  • Total bandwidth 115 GB/s on GDDR3
  • Total bandwidth 140 GB/s on GDDR4
  • Consumer memory support 1024 MB
  • DX10 full compatibility with draft DX10.1 vendor-specific cap removal (unified programming)
  • 32FP [sic] internal processing
  • Hardware support for GPU clustering (any x^2 [sic] number, not limited to Dual or Quad-GPU)
  • Hardware DVI-HDCP support (High Definition Copy Protocol)
  • Hardware Quad-DVI output support (Limited to workstation editions)
  • 230W TDP PCI-SIG compliant
This time around it appears AMD is going for a different approach by equipping the ATI R600 with less unified shaders than NVIDIA’s recently launched GeForce 8800 GTX. However, the unified shaders found on the ATI R600 can complete more shader operations per clock cycle.

ATI's interal guidance states the R600 will have 320 stream processors at launch; 64 4-way unified shaders only accounts for 256 of these stream processors.

Level505 claims AMD is expected to equip the ATI R600 with GDDR3 and GDDR4 memory with the GDDR3 endowed model launching in January. Memory clocks have been set at 900 MHz for GDDR3 models and 1.1 GHz for GDDR4 models.  As recent as two weeks ago, ATI roadmaps had said this GDDR3 launch was canceled.  These same roadmaps claim the production date for R600 is February 2007, which would be after a January 22nd launch.

Memory bandwidth of the ATI R600 is significantly higher than NVIDIA’s GeForce 8800-series. Total memory bandwidth varies from 115GB/s on GDDR3 equipped models to 140GB/s on GDDR4 equipped models.

Other notable hardware features include hardware support for quad DVI outputs, but utilizing all four outputs are limited to FireGL workstation edition cards.

There’s also integrated support for multi-GPU clustering technologies such as CrossFire too. The implementation on the ATI R600 allows any amount ofATI R600 GPUs to operate together in powers of two. Expect multi-GPU configurations with greater than two GPUs to only be available for the workstation markets though.

The published results are very promising with AMD’s ATI R600 beating out NVIDIA’s GeForce 8800 GTX in most benchmarks. The performance delta varies from 8% up to 42% depending on the game benchmark.

When DailyTech contacted the site owner to get verification of the benchmarks, the owner replied that the benchmark screenshots could not be published due to origin-specific markers that would trace the card back to its source -- the author mentioned the card is part of the Microsoft Vista driver certification program.

If Level505's comments seem a little too pro-ATI, don't be too surprised.  When asked if the site was affiliated in any way to ATI or AMD, the owner replied to DailyTech with the statement that "two staff members of ours are directly affiliated with AMD's business [development] division."

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: Different Strategy?
By Spoelie on 12/30/2006 11:30:56 PM , Rating: 4
No it is exactly as they said it. The 7-series featured a maximum of 24 complex shaders, while the x19x0 series had 48 or 36 simple shaders. While the x19x0 was undoubtedly faster in the majority of benchmarks and it did have more shader processing power than the 7-series, the wins were close most of the time and the difference in shader power not that substantial - surely not double.

Now we have nvidia with a lot of simple shaders in the 8800 and ati with less, more powerful shaders in the R600. It is too early to tell which of the two choices provide the most aggregate shading power. But what we can say is that the roles from the generation before are definitely reversed.

(complex and simple used in relative terms here..)

RE: Different Strategy?
By THEiNTERNETS on 12/30/2006 11:46:33 PM , Rating: 1
Ah, okay. It makes more sense when you put it that way.

Then again, they are both unified shaders, so when you say "simple vs complex" are we supposed to assume that the real difference has to do with ATi's being 4-way? (what does that mean anyways?)

Seems like that "4-way" property is the key to 64 shaders in any way being able to match up against 128.

RE: Different Strategy?
By Furen on 12/31/2006 12:19:23 AM , Rating: 2
These 4-way shaders are really SIMD shaders that operate on 4 pieces of data at once. This means that you do not have as much granularity (which will lead to part of these units being idle at times) but they probably take less die space (and power) than they would as individual shader units. ATI is probably using the transistors saved on the shader units elsewhere, like improving its memory controller (which is a 1024bit/512bit monster), etc.

Nvidia has twice the amount of shader units and twice the clock speed (the shader units on the Nvidia side run at 2x+ the core clock) but they only work on a single operation at once.

I wouldn't label either of these approaches simple or complex since individual operations are simple for both of these approaches.

RE: Different Strategy?
By Spoelie on 12/31/06, Rating: -1
RE: Different Strategy?
By Spoelie on 12/31/2006 10:12:16 AM , Rating: 2
Now that I look at it, the specs say 128 shader (not data) operations per cycle. So while nvidia's shaders run over twice the clock speed as the rest of the gpu, there is apparantly some double pumping going on in ati's shaders as well, beside the fact that they're 4-way.

The only way to know for sure is get confirmation from ati i guess, and that won't happen before the NDA dates are reached.

RE: Different Strategy?
By Spoelie on 12/31/2006 12:24:23 AM , Rating: 2
The difference between the two is that Nvidia's shaders are scalar, i.e. they operate on a single 'number' at any given time. ATi's shaders are 4-way/Vect4 (vectors instead of scalars) in the sense that they can operate on 4 numbers at the same time (SIMD - single instruction multiple data). As such, 64 of ATi shaders AT THEIR PEAK should be equivalent to about 256 of nvidia's scalar shaders.

If you look at it only that way then nvidia wouldn't stand a chance. However, there's also the fact that nvidia runs those shaders at a lot higher clock than the rest of the gpu - we don't know how ATi's shaders are configured as of yet and at what clockspeed they are running.

There are also other factors playing, what is the workload, how many are processing vertex data and how many are processing pixel data, how well are the shaders adapted at their respective workloads, are they being well fed etc. etc. etc.

More details about ATi's configuration will probably only be revealed at the official launch date through tech docs.

RE: Different Strategy?
By otispunkmeyer on 1/2/2007 4:14:40 AM , Rating: 1
the way i understand this is

Ati's shaders being 4-way in theory gives them an upper hand (depending on clocks) like some one their peak and maximum efficiency they should be equiv to 256 of nvidias scalar processors.


the scalar processors NV has will be easier to utilize, they will be more efficient. so yeah Ati's can go 4 ways at once...but it might be harder to keep them working at their peak.

as always, there is usually more than 1 route to the same results, and this is all i see. i expect R600 to be on par with G80. 2 different methods, same outcome...with each having their own little pros and cons.

GDDR3 was expected too, GDDR4 has been used previously by ATi, but i dont think its ready just yet, and the 900Mhz GDDR3 modules seem to have no problems eclipsing 1Ghz either.

it will be interesting to see how the bandwidth increases play out. personally unless you are sporting dell's 30incher i dont think its going to provide much more performance and with shader effects getting more complex and more frequent perhaps memory bandwidth will be less important. i think the massive 115Gb/s bandwidth will go under-utilized for much of its early life.

RE: Different Strategy?
By MAIA on 1/8/2007 8:11:22 AM , Rating: 1
the scalar processors NV has will be easier to utilize, they will be more efficient.

This arguments holds no ground. You simply don't know if ATI shadding engine will be easier or more efficient.

RE: Different Strategy?
By Sharky974 on 12/31/2006 2:10:32 AM , Rating: 2
>>No it is exactly as they said it. The 7-series featured a maximum of 24 complex shaders, while the x19x0 series had 48 or 36 simple shaders. While the x19x0 was undoubtedly faster in the majority of benchmarks and it did have more shader processing power than the 7-series, the wins were close most of the time and the difference in shader power not that substantial - surely not double.

Now we have nvidia with a lot of simple shaders in the 8800 and ati with less, more powerful shaders in the R600. It is too early to tell which of the two choices provide the most aggregate shading power. But what we can say is that the roles from the generation before are definitely reversed. >>

This isn't true at all. ATI, you can look this up, stayed with the exact shader pipe configuration they used in with the X800 series. In other words, the 48 shader pipes in R580 are exactly the same as the 16 shader pipes in X800.

And if you'll recall, the 16 pipe X800's were a good match for the 16 pipe 6800's. Pipe for pipe clock for clock, they were almost equal. Therefore one ATI pipe IS a good match for one Nvidia pipe. Although I believe one Nvidia pipe to be slightly faster, we're talking on the order of 5-10% here.

What everybody cant figure out is why the "48 shader" (really, 48 shader pipe, of the exacts same configuration as in the R420 series) R580 didn't blow the 24 pipe G71 away. Well, the reason almost certainly (Xbit labs mentions this in there R580 benchmarks a lot) is that the 580 series is heavily texture limited. It has only 16 (dedicated) TMU's. G71 textures with one of it's shader alu's, so basically has 24.

So basically it doesn't matter how much shading power the R580 has, it's bottlenecked by textures/fragment throughput.

A easy way to see this, among many, is to look at the X1800XT, which at just 16 shaders/pipes, competed well with the 24 pipe 7800GTX. Why? It had 16 TMU's too. The only thing a R580 has over R520 is, 3X the shader power, but the texture throughput remains the same. If the game isn't shader bottlenecked but rather texture bottlenecked, the R580 would theoretically perform no better than a R520.

Another easy way to tell is look at the X1800GTO (a 12/12 TMU/SHADER config), and how bad it destroys the ill fated X1600XT (a 4/12 TMU bottlenecked config). Both have the same number of shaders and similar clock, but the GTO has 8 more TMU's and blow away the X1600XT.

This was all part of ATI's bright idea that TMU's/Shaders needed to be in a fixed 1:3 ratio as games became more shader heavy. (Hence 16:48 on R580, 4:12 on X1600XT, etc. It was a colossal debacle IMO. And basically while ATI did okay performance wise, the real key is to remember their dies were about twice as big as an Nvidia die for comparable performance, meaning the design is terribly innefficient, and much more costly (to ATI, at least) for similar performance as competing Nvidia parts of the time.

The R600 at least appears to end all that nonsense, anyway. In fact early measurments put the R600 die at a bit smaller than G80, with competitive performance if these benches/rumors are at all belieavable. Granted it's on 80nm, but that doesn't make a huge difference.

RE: Different Strategy?
By Sharky974 on 12/31/2006 2:33:26 AM , Rating: 1
>>Now we have nvidia with a lot of simple shaders in the 8800 and ati with less, more powerful shaders in the R600.>>

Just to clarify, yes I agree with this part. I dont agree however with the statement that the G71 (7 series) featured "complex" shaders while the R580 (X19XX series) featured simple ones.

I would agree with a generic statement that ATI has featured more raw shading power in both cases (last gen and R600/G80), however, even if last time they didn't neccesarily take advantage.

RE: Different Strategy?
By Spoelie on 12/31/2006 9:47:41 AM , Rating: 4
Your claim that the shaders are equivalent through the generations is completely baseless. An x800 shader is PS2 to start with, while r5x0 series was redesigned for PS3. Also, the x1600xt already had the 'simple' shader principle, while x1800 gpus had more powerful singular shaders - that's why the x1800gto 12 shaders were faster. As such, the x1900 is not at all 3x x1800 but 4x x1600. It has nothing to do with TMU's.

You have to think of the R520 as a completely seperate generation. The only reason why it came out together with the middle/value range was because of it's tremendous delays, so much that the line after it (x1300/x1600/x1900 with redesigned shaders) was already finished when it came out. Another telltale sign is that the r520 doesn't support all the features as the other members of the x1xx0 family, be it value or high end.

And yes, I did look it up.

RE: Different Strategy?
By Sharky974 on 1/1/2007 6:59:02 AM , Rating: 3
Are you crazy? X1800 and X1600XT had the EXACT SAME shader pipes! And 12 apiece. The ONLY major performance difference was TMU's.

The X1800XT had the EXACT SAME shader pipes as X1900 series as well, n fact they all do. The same major-mini-alu setup. ATI didn't want to rewrite their compiler.

The part about X800 pipes isn't even relevant. I dont think shader ALU's have anything to with PS3.0, that's more about support for dynamic branching and other features elsewhere on the chip (probably requires 32 bit ALU's as biggie).

"And yes, I did look it up."

Uh where perchance? Link? There will be no link, because you're dead wrong, it's only a matter of how much you want to squirm.

I dont really have time right now to hunt down a bunch of proof, but if you come back in a couple days I'll do it then. In the meantime, I invite you to find me one source that X1600 XT has "simpler" shader than X1900/X1800GTO etc. Quite frankly, I'm 100% certain you cannot.

X1800GTO was simply a qaud disabled X1800 (Which was ALSO a 16 pipe card..yet beat the 24 pipe 7800 GTX...simpler shaders my ass). X1600XT was the same design, exact same shader pipes, but built from the ground up to be smaller for the mid range.

And again, X1900 has 48 of the EXACT SAME shader pipes that X1800 has 16 of.

ATI did all this on purpose..they wanted to be able to scale shaders easily and fell in love with the 1:3 ratio..which they considered the future. Unforunatly it sucked.

The R520 and R580 are the SAME DESIGN just scaled with minor changes. The X1600XT is based of the R520/80 type design, as is every ATI chip since the X800 range.

R580 would have been a kickass card, that blew away Nvidia, if it only had 24 or even 32 TMU's, to relieve that bottleneck. It certainly had an overdose of shader power..

RE: Different Strategy?
By gibletsqueezer on 1/2/07, Rating: 0
RE: Different Strategy?
By Spartan Niner on 1/2/07, Rating: 0
RE: Different Strategy?
By gibletsqueezer on 1/2/07, Rating: 0
RE: Different Strategy?
By MAIA on 1/8/2007 8:25:06 AM , Rating: 2
As can be seen from the diagram above, the primary difference between R520 and R580 lies in the number of pixel shader units available to each of them, with R520 having 16 pixel shader cores, while R580 triples that number. As is the case with all of the R5xx series, each pixel shader core consists of:

* ALU 1
o 1 Vec3 ADD + Input Modifier
o 1 Scalar ADD + Input Modifier
* ALU 2
o 1 Scalar ADD/MUL/MADD
* Branch Execution Unit
o 1 Flow Control Instruction

The net result is that R580 contains 48 Vector MADD ALUs and 48 Vector ADD ALUs operating over 48 pixels (fragments) in parallel. Along with the pixel shader cores, the pixel shader register array on R580 has also tripled in size in relation to R520, so that R580 is still capable of having the same number of batches in flight.

The R580 diagram above isn't entirely accurate, as it indicates that there are 12 different dispatch processors for R580, where R520 has 4 - this is, in fact, not the case. As with RV530, R580's "quads" (or lowest element of hardware scalability in the pixel pipeline) have increased such that they can handle three quads in the pixel shader core, but they do so by operating on the same thread. The net result is that R580 still contains 4 different pixel processing cores, ergo only 4 dispatch processors, with each core handling up to 128 batches/"threads". As a result, R580 is still handling a total of 512 batches/"threads" as R520 does. Each of the batches in R580 consist of a maximum of three times as many pixels (48), so that they can be mapped across the 12 pixel shader processors that exist in each of the 4 cores, with the net result being that R580 can have 24,576 pixels in flight at once. Note that because this is still based around 4 processing cores, the lowest level of shader granularity is likely to be 12 pipelines, so if ATI releases parts that's had failures within the pixel pipeline element of the die the next configuration down would likely be 12 textures, 36 shader pipelines, and 12 ROPs.

ATI have also increased the Hierarchical Z buffer size on R580, which can now store up to 50% more pixel information than R520 can, allowing for better performance at even higher resolutions. However, other than that most of the other elements stay the same, shader wise, with R580 still having 8 vertex shaders, single Z/Stencil rates (unlike RV530) and still continuing with 16 texture units serving all 48 shader processors.

RE: Different Strategy?
By MAIA on 1/8/2007 8:27:34 AM , Rating: 2
Fetch4 is actually not just implemented within R580 but RV530 and RV515 as well, although curiously not R520. Because of the relatively low shader capabilities of R520, in relation to R580, it's more likely to be shader bound on operations such as these anyway, so the increase in the sample time is less likely to be an issue. With R580 though, as it has such a high math capability in relation to its number of texture samplers it's more important that its texture utilisation is optimised, so wasting 3 cycles on single precision texture formats is going to bottleneck it more.


Can be read on the next page

"My sex life is pretty good" -- Steve Jobs' random musings during the 2010 D8 conference

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki