Print 41 comment(s) - last by murphyslabrat.. on Jul 30 at 12:03 PM

AMD talks Bulldozer  (Source: AMD)

AMD details "Falcon," a mainstream processor for "Copperhead"  (Source: AMD)
AMD talks details of "Bulldozer," the first completely new architecture since K8

AMD plans to launch its third-generation Opteron platform in 2009 with the Sandtiger octal-core processor. Beneath Sandtiger is AMD’s M-SPACE modular approach towards CPUs. M-SPACE allows AMD to mix and match CPU features for specific tasks.

The definition for M-SPACE is as follows:
  • Modular: Reconfigurable “building blocks” for design speed/agility
  • Scalable: Linear scaling of multi and single-thread performance
  • Portable: Energy-efficiency for increased mobility/portability
  • Accessible: Ongoing commitment to open innovation
  • Compatible: Backward compatibility and ease of upgrade
  • Efficient: Optimal on-chip and system level I/O efficiency
Sandtiger’s eight cores consist of eight AMD Bulldozers. Bulldozer is the name AMD has given to one of the CPU cores for its M-SPACE architecture. AMD claims dramatic performance-per-watt improvements in HPC applications with Bulldozer cores. Unlike Barcelona and Shanghai, which have evolved from AMD’s K8 architecture, Bulldozer is a completely new design developed from the ground up.

AMD installs eight Bulldozer CPU cores in Sandtiger with a memory control. AMD optimizes the design for servers and raises the performance-per-watt bar for single and multithreaded applications.

The modular M-SPACE technology also finds its way into Fusion. AMD plans to mix and match M-SPACE components for Falcon, a Fusion processor optimized for mobile and mainstream desktops. Falcon forms the basis of AMD’s planned Copperhead mainstream desktop platform. Falcon features four Bulldozer CPU cores with an integrated graphics processor. The integrated graphics processor features DirectX 10, possibly 11, support with AMD’s Universal Video Decoder, or UVD, technology. Falcon also features integrated PCIe.

In addition to Bulldozer, AMD has the Bobcat CPU core for Fusion processors designed for mobile, ultra-mobile and consumer electronics applications. Bobcat is also a completely new design and has greater power scaling capabilities. Bobcat-based processor designs can consume as low as one watt of power. AMD has not announced any details of Bobcat-powered Fusion processors yet.

Expect AMD to introduce Fusion designs based on Bulldozer and Bobcat beginning in 2009.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

Good picture, but arguably wrong idea
By azrael201 on 7/26/2007 4:44:18 PM , Rating: 1
I can see why you decided to use Transformers in the article picture, especially since the Constructicons (pictured) are like the theme of the article, "modular"-in a sense. However, when they all merge to form the Devastator, they are "[ironically]...sacrifice their thinking ability in their combined form." (wiki)

Arguably one can also think of it in terms of "raw power" which is what Devastator is. I just think it's funnier if AMDs chips combined to form one dumb brute of a machine.

RE: Good picture, but arguably wrong idea
By Duraz0rz on 7/26/2007 4:55:53 PM , Rating: 4
well, computers are technically dumb to begin with ;)

RE: Good picture, but arguably wrong idea
By BladeVenom on 7/26/2007 5:04:35 PM , Rating: 4
"To begin with"? None of mine have gotten any smarter since I've owned them.

RE: Good picture, but arguably wrong idea
By FITCamaro on 7/26/2007 6:02:45 PM , Rating: 5
Mine has a gun to my head right now. I guess its learned from me. Shit.

RE: Good picture, but arguably wrong idea
By James Holden on 7/26/2007 11:20:21 PM , Rating: 4
We can fight the machines. We can fight the apes. But we can't defeat both at the same time! -- Tenacious D

RE: Good picture, but arguably wrong idea
By LogicallyGenius on 7/27/2007 6:34:34 AM , Rating: 2
Why waste time thinking about 2009

Lets focus on now, 12MB Intel Quad Core, DDR3, 1333 FSB, Asus on board ram, onboard shader 3.0 etc etc

RE: Good picture, but arguably wrong idea
By TSS on 7/27/2007 7:09:34 PM , Rating: 2
because we can't talk about 2009 in 2009 we have to talk about 2011 by then.

that and because we already have talked about now, you are able to state that the new intel proc will feature 12mb cache and the fast fsb.

oh, and computers can get smarter with age :) you just have to program them to.

By Brockway on 7/27/2007 7:59:47 PM , Rating: 2
So Amd is going for combining a gpu and cpu on the same die? At first, this seemed kind of a weird idea, just another thing to add to the cost like integrated graphics. But then I thought, are they talking unified shader type gpu? A cpu with a gpgpu on board? Even if you added a discrete video card later, the on-die gpu could still be useful for physics, folding type programs, all kinds of crap that benefits from parallel processing. That seems like a dang exciting concept.

By murphyslabrat on 7/30/2007 11:51:01 AM , Rating: 1
I guess you've never updated the packaged drivers? Never overclocked? never replaced a single component?

Mine has just about tripled in "smarts" since I got it ;j

By DeepBlue1975 on 7/26/2007 6:56:28 PM , Rating: 2
And then I think I was going to buy a smart phone!
I'll put smart phones to do IQ tests before I buy them.

By yacoub on 7/26/2007 7:43:07 PM , Rating: 2
As noted in my comment to the Anandtech AMD article just posted which uses the same image, that top graphic supposedly showing 'how much better' Bulldozer will be just cracks me up because it completely lacks any numbers on the chart. They must want us to measure its improvement in pixels ;)

RE: hahah
By yacoub on 7/26/2007 7:43:54 PM , Rating: 5
"How much better is it, Bob?"

"I dunno Bill, but it's got a much longer arrow so that's gotta be "a lot better" right?"

RE: hahah
By mmarq on 7/26/2007 10:25:07 PM , Rating: 5
that top graphic supposedly showing 'how much better' Bulldozer will be just cracks me up because it completely lacks any numbers on the chart. They must want us to measure its improvement in pixels ;)

Well i don't want to enter in much speculation but it seems that sketchs were around since 2001:

Its not uncommon for manufacturers to do prototypes, but that one is really amazing for *2001* ;

It seems to be entering in the camp of decoupled architecture, with separated dual integer/floating point execution cores. But that is not an entirely decoupled architecture, is a clustered one with much capabilities for multi-threading

It seems a 5-6 way width (K8/K10 are 3), issuing 6 instructions per clock instead of the 3 of K8/K10.

Superpipelined with at least 15 stages instead of the 12 of K8/K10. (like IBM, meaning perhaps more than 5GHz at the 45nm process)

Multi Level cache with a L0 with 1 cycle latency , fetching *4 instruction lines per clock!* from this L0 and 1 one more from the L1. So this beast fetches most likely 80 Bytes per cicle to the pipeline, aginst 16B for K8 and 32B for K10 and Core 2

Forward Collapse Unit together with the Branch Predictor can increase ILP by effectively removing up to 2 conditional branches per cycle

Branch prediction: "going both ways before deciding on the prediction"... branch and the destination address in the same "run" of code...

The "forward collapse unit" can handle up to two short forward branches per cycle to handle these nested "if-then-else-statements"

A huge 64k entry branch history table is used for branch prediction. The "taken/not taken" results of the 16 latest conditional branches are used to calculate an index in the 64k table. The table contains 65536 2 bit bimodal counters that hold the predictions: 0 strongly not taken, 1: weakly not taken, 2 weakly taken, 3: strongly taken. Such a large table can store the characteristic branch patterns of many different branches in a larger program without much interference.

Instruction pre-decoding:Each byte in the instruction caches has 2 bit of pre-decode information

ESP Look Ahead unit.: "pre-executes" some operations simultaneous to the decoding of instructions , long before the instructions enter the Out-Of-Order execution pipeline. It co-operates with a future (register) file that indicates if an x86 register is still valid if all preceding instructions still in the pipeline are executed. The ESP look ahead unit Increases Instruction Level Parallelism, multiple PUSHes and POPs can be executed simultaneously.

Stack sideband optimization: Instructions that add an immediate value to the stack pointer like PUSH; POP; ADD ESP, IMM; can be handled in parallel. So-called "constant generators" determine the constants to be added to the stack pointer for up to six stack instructions per cycle.

Memory Loads can be performed earlier on, meaning pre-fetching.

Relaxed Load / Store Ordering; Loads before stores.

OoO engines.

The most remarkable feature is that it seems that the L0 and L1 are not sequential but somehow parallel,... and if that deserves more discussion, L0 must have 'hot code' scanned from a pre- decoded L1, because otherwise how could it have:

" [L0]... simultaneously provides the code that has to 'be' executed when a conditional branch is taken as well as the code that has to be executed if the branch is not taken."

IMO pre-scanning the L1 somehow because in L0 must be code in for branch is 'taken' an 'not-taken' at the same time , and pre-execution makes that design absolutely brilliant... BUT THAT WAS 2001... it surely could see improvements in more than a couple of places.

And with that Pre-execution or a-head execution of the ESP Look Ahead unit, there is a remarkable branch unrolling and elimination so important for Streaming code. It seems that the designers wanted a chip that never has to see its pipelines flushed because of a wrong guess, and the same circumstances when stalled because of a cache miss.

So K8 was an enough pale resemblance of K8-1. From K8-1, K10 only now will introduce sideband stack optimization (like core2), loads before stores (like core 2) and 128bit SSE units (like core 2), which K8-1 doesn't have because in 2001 people were only dreaming of it a no one dared to put that on paper.

Roughly as it is, and extrapolating, it seems this beast could have surely more than 50 % advantage over a k10.

RE: hahah
By mmarq on 7/26/2007 10:27:34 PM , Rating: 5
Continue from above, because DaylyTech doesn't allow to long a post:

Now if they also go for Clustered Speculative Multhithreading,
that is the possibility of a mechanism for breaking monolithic workloads into multithreaded ones On the Fly than a BullDozer could accelerate the big INT applications of today by a factor up to 1,6x. This forced multithreading, like in the reverse hyperthreading rumor, is what the author of hardocp seems to indicate(he was there asking questions)

" Bulldozer seems to be able to unite its core to work together on a single threaded application "

Now a BullDozer on the lines of a Clustered Speculative Multithreading K8-1, could have 80% better performance than a K10 and 2x the performance of a core 2.

Now everybody can collect signs that CPU manufacturers are heavy on the field of helping software developers at multithreading, parallelize and stream their work loads. CTM , EXOCHI, TBB and other stuff..

And as stated here; with compiler automatic vectorization optimization, they could reach over 100% for some benchmarks, expectable number in average since they are claiming over 1500% for other hand tuned bechmarks, meaning that in a generic load with 40% of codes "streamables", if improvement can reach an average of 100%, then by Amdahl's law, we can get 1/(0.6+0.004) = 1,665x.

> 66% is in average what is expected to be achieved with *2* micro-arquitecture upgrades... that is a LOT considering that schemes like CTM could put that number much higher.

So a a Fusion chip, a real integrated fusion without a separated CPU and GPU, but one with the streaming GPU in another pipeline of the CPU, like happened with the FP x87 in todays...

" He also described the merging of CPUs and GPUs in detail. His vision sees AMD's GPU technology being totally integrated into the CPU, much like we saw the floating point processor integrated into our current CPUs. "

" As Fusion moves forward we are going to be seeing CPU and GPU sharing transistors and actually becoming “fused” together in a more direct sense or at least that is how Phil Hester has explained his vision to HardOCP."

So a fusion chip with clustered speculative multithreading based on lines of a k8-1 could have >145% better performance than a k10, and >150%(2,5x)better performance than a core 2, at the same clock with the same number of cores, specially true for 8 cores and not far for 4 cores.

All in all, that graph bar is not pixels and it seems not far fetched at all. Depending on the implementation it could even be a little conservative.

So its not only AMD, but Intel to, that have always guarded their best designs, trying to squeeze the most money possible out of the market, while the enthusiasts get to each other throats over their preferences, while in theory they could deliver much much better.

Of course the only excuse they have is time and money, because radical designs require both.

RE: hahah
By crystal clear on 7/27/2007 5:41:22 AM , Rating: 2
Of course the only excuse they have is time and money, because radical designs require both.

yes to the above I would add-ones(radical designs) that really work & feaseable-there is no guarrantee of success !

Its a gamble that can backfire.

Even if you take a very optimistic stand-do we have the software for these radical design.

Software & hardware dont come along at the same time,rather the software lags far behind the hardware.

Intel & AMD certainly do not talk about their projects that go flop & scrapped altogether.

Great plans is one thing- to deliver is another.

To summarize-I would say its not only time & money, but the ability to deliver in time

Can AMD deliver ?

RE: hahah
By mmarq on 7/27/2007 1:20:20 PM , Rating: 1
To summarize-I would say its not only time & money, but the ability to deliver in time

Can AMD deliver ?

In a unilateral point of view... YES

In a sense they all could deliver much more. The decisive point is not a window of opportunity based on theoretic maximum performance possible against the competition, but profit

A leapfrogging design is only introduced when the competition is clearly ahead, because no one is about to trash the value of current propositions by introducing a much more performant part. So manufacturers only introduce variations that don't do that trashing.

Most of times they could deliver more and in time, the problem is that they don't want to.

Enthusiasts care about performance, they care about profit. And in that sense is absurd to pay more than the double for an enhanced part that only do a few more FPS or seconds in some benchmarks.

Its akin to squeeze the gullible

Big Statement.
By Mitch101 on 7/26/2007 7:15:22 PM , Rating: 2
Did anyone else catch the "Designed to the highest performing single and multi-threaded compute core in history"

Im also surprised no one else is excited to hear this is a totally new chip not designed from a previous architecture. I am not so curious about phenom as I am about BullDozer. I heard this term about 2 years ago when an AMD sales agent said they werent worried about Intel and when asked why they had one word. BullDozer. I think everyone thought this was going to be just 2 AMD quad cores stuck next to each other on a motherboard.

Where is the excitement? This is like Intel pulling the Core2 out of nowhere instead of trying to build off the P4.

RE: Big Statement.
By Proteusza on 7/27/2007 4:45:33 AM , Rating: 2
Intel didnt build off the P4, they scrapped it because it was such a horrible mess.

Conroe core CPU's are derived from Pentium M's, which themselves derive from Pentium 3's.

RE: Big Statement.
By zsdersw on 7/27/2007 6:32:16 AM , Rating: 2
That's a pretty ridiculous statement; that the P4 was such a horrible mess. It wasn't.

Netburst provided, among other things, the excellent branch predictor used in Core 2.

RE: Big Statement.
By murphyslabrat on 7/30/2007 12:03:37 PM , Rating: 2
Did anyone else catch the "Designed to the highest performing single and multi-threaded compute core in history"

Oh, sure I did. That is what half the proccessors in history was designed to be, CPU or CoPU.

Things are starting to make sense
By Justin Case on 7/26/2007 7:01:01 PM , Rating: 3
If AMD is serious about the modular approach (they've been hinting at it for a long time, and they have a couple of intriguing patents, including one that looks like a way to turn multiple cores into a single superscalar CPU), that might explain some of their recent moves.

A "variable configuration" CPU would definitely need some form of code morphing to work transparently (ex., software tries to run instruction "X", and the CPU decides, based on its modules, whether to run it on a dedicated module or morph it into multiple instructions for the "generic" module). Doing this efficiently is a lot more complex than simple micro-op decoding. Transmeta has a ton of patents in this area, and AMD's acquisition of a (small but relevant) share might mean free (or at least cheap) access to those patents.

As to the on-die PCIe, we've been waiting for it since AMD licensed the patents from RAMBUS. I was kind of hoping that some of those new pins on Socket-F were precisely for that.

There's an increasing feeling that Barcelona is just a K8 "expansion pack" to keep the fans quiet (and pay the bills), and the real "sequel" keeps getting pushed back as they decide to add new features.

RE: Things are starting to make sense
By EarthsDM on 7/26/2007 8:54:48 PM , Rating: 2
AMD claimed that the K8 architecture was modular, IIRC.

By Justin Case on 7/26/2007 8:57:52 PM , Rating: 2
Never heard that, all talk of modular CPUs was well after the K8 had been released. Any links?

By 91TTZ on 7/26/2007 10:17:20 PM , Rating: 3
AMD talks details of "Bulldozer," the first completely new architecture since K8

The K8 was not a completely new architecture. It was an incremental upgrade over the K7.

Going from the K6 to the K7 brought a completely new architecture. Since the K7, they've been making evolutionary changes, not revolutionary changes.

By trunxhml37 on 7/30/2007 1:07:44 AM , Rating: 2
Graphics on the CPU... sounds great on paper. why not send graphics over the hyper transport bus. That would get instructions from ram to CPU/GPU incredibly fast. It doesn't seem like it would be easy to squeeze high end graphics features into such a small die that's shared with the Main CPU. If I add a Beast Graphics card (ie 8800 ultra) to a computer using this processor, what would happen to the part of the processor that's meant for graphics. would those cores just take up space and cost me money for cores that don't get used.
One might also think that this is a great idea for a laptop. Which it sorta is. but it is highly unlikely that an eight core processor has any sort of power efficiency which would deter people from this processor from most laptop buyers. I mean who wants a laptop that has 40 mins of battery life.
It actually sounds like a great Idea for server purposes. the on processor graphics would relieve bandwidth from the northbridge which is a very high traffic area for servers anyway. Most Servers aren't particularly concerned with super high end graphics and multicore processors are always a plus for servers. They're also not perticularly concerned with power consumption as much as laptops are.
It seems as though this basic design is aimed towards the server users. A basic desktop user doesn't need an 8 core processor. A high end desktop user would want to customize the graphics card. A laptop user wouldn't need 8 cores and can still achieve great performance on 2 cores which would yield better bettery life anyway.
If any AMD people are reading this. The list of consumers that this kind of processor would benefit is smaller than what I think AMD should be aiming for.

Blah, blah, blah....
By StillPimpin on 7/26/2007 4:53:48 PM , Rating: 1
Just build the frikin thing already!

So we've only got about 18 months . . .
By Denigrate on 7/26/07, Rating: -1
RE: So we've only got about 18 months . . .
By Oregonian2 on 7/26/2007 4:36:55 PM , Rating: 2
Knocking AMD based on experience with Intel?

By JackPack on 7/26/2007 5:42:34 PM , Rating: 2
AMD is claiming a deeper pipeline.

RE: So we've only got about 18 months . . .
By zaki on 7/26/2007 4:38:48 PM , Rating: 2
off topic: why do some people write their names after their posts, even though 2 mm above their posts their names are written next to "by".

RE: So we've only got about 18 months . . .
By Brandon Hill on 7/26/2007 4:41:00 PM , Rating: 6
I have no idea.

-Brandon Hill

RE: So we've only got about 18 months . . .
By 3kliksphilip on 7/26/2007 4:49:53 PM , Rating: 2
How do you get a rating of 6 for a post? I've seen it a couple of times but I'm not sure how it happens. Sorry for being a n00b.

By AmbroseAthan on 7/26/2007 4:58:24 PM , Rating: 2
One of the mods, or possibly only Kristopher, deems your post worthy of "great comment" status.

RE: So we've only got about 18 months . . .
By quiksilv3r on 7/26/2007 5:00:23 PM , Rating: 6
Isn't it obvious? You sign your name.


By KristopherKubicki on 7/26/2007 5:14:46 PM , Rating: 4
Lol you and Brandon get gold stars for today.

RE: So we've only got about 18 months . . .
By Korvon on 7/26/2007 6:23:14 PM , Rating: 2
Kristopher doesnt get one... didnt sign his name. :P

By crystal clear on 7/27/2007 7:38:27 AM , Rating: 2
because its a copy & paste job-simple

"I'd be pissed too, but you didn't have to go all Minority Report on his ass!" -- Jon Stewart on police raiding Gizmodo editor Jason Chen's home
Related Articles

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki