backtop


Print 55 comment(s) - last by Regs.. on Jun 8 at 9:33 AM


AMD's K8L diagram

AMD's K8L cache design

Four 16-bit or Eight 8-bit HyperTransport Links

K8L's instruction dispatch diagram
K8L details continue to pour in at AMD's Technology Analyst Day

During the AMD Technology Analyst Day, AMD’s CTO Phil Hester rehashed the majority of the K8L information we discussed on DailyTech several days ago, but disclosed further details on specifics.  Hester was very specific to refer to these new technologies as simply “new architecture,” and never using the K8L core name. 

Internally, AMD engineers use the codename Greyhound to describe the "new architecture."

A major push for AMD’s K8L design is in “modular” component design – meaning everything from L3 cache to memory controllers are developed as individual components and linked together with reusable, robust designs.  To some extent, processor design is already modular with libraries and designs that are developed individually.  However, Hester insists this new modular approach takes this modular approach even further, claiming that the company is developing “better define the interfaces for each of these building blocks.”

Additionally, Hester revealed some more information about the cache specifics on K8L.  Each K8L core will have 64KB of dedicated L1 cache, followed by 512KB of dedicated L2 cache.  The base models of K8L will have 2MB of shared L3 cache, but Hester also went on to claim that adding more L3 cache was in the company’s roadmap.  One thing AMD representatives have not particularly touched on is the cache reduction from 64+64KB (data+instruction) to 32+32KB.  AMD employees have assured us this move is logical with the addition of L3 cache. 

A major feature of K8L is DICE, or Dynamic Independent Core Engagement.  Essentially, the ACPI layer will have the ability to dynamically adjust individual cores and crossbars on the processor.   Each processor core will have the ability to enter its own power-state, or p-state, allowing a K8L processor the ability to conserve power when the system does not have enough threads to utilize the other processor cores.  Intel’s Core processors have the ability to enter c-states on a per-core basis, but the AMD demonstration showed a quad-core K8L processor dip individual cores into full halt.

Opteron servers right now are, for the most part, limited to PCs with eight sockets or less.  Part of this is due to the fact that each processor has only three HyperTransport links.  Hester announced that the next generation Opteron core will have four 16-bit HyperTransport-3 links running at 2.6GHz each.  These four links can reconfigure into eight 8-bit HyperTransport links in a process called “un-ganging,” which is a fundamental feature of HyperTransport-3.  Essentially, one could have an eight-socket server with thirty-two fully connected cores.  Each processor will be able to take advantage of any of the eight memory banks within one memory hop.  The HyperTransport-3 specification claims un-ganging mode can work on the fly, meaning that even a fully connected eight-socket server could dynamically change two 8-bit links into a single 16-bit link during operation to increase I/O at critical moments. 

K8L is designed as a native quad-core design, although slides from the Technology Analyst Day also revealed that a dual-core desktop SKU will appear in mid-2007.  So far, DailyTech has touched a little bit on the 65nm quad-core code names announced in AMD roadmaps, but to our knowledge the code names for dual-core K8L processors have not been disclosed.

Update 07/06/2006:
  Please read the update to this article about the K8L L1 cache sizes.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

L1 Cache
By ViRGE on 6/1/2006 2:42:39 PM , Rating: 2
I don't suppose AMD explained why they're cutting the L1 cache in half? I'm assuming they're doing something here(such as increasing the associativity) in order to keep a similar hit rate on a smaller L1 cache.




RE: L1 Cache
By Ecmaster76 on 6/1/2006 2:54:53 PM , Rating: 1
I don't think they changed the L1 cache. The total K8 L1 is 128KB, but it is really 64+64 (data+instruction). I t may just be a difference in wording.


RE: L1 Cache
By Von Matrices on 6/1/2006 3:08:25 PM , Rating: 2
One of the slides says "32B Instruction Fetch". If this is a typographical error that I think it is, and AMD meant "32KB" instead of "32B", then the cache is truly halved.


RE: L1 Cache
By KristopherKubicki (blog) on 6/1/2006 3:57:30 PM , Rating: 2
Correct, the L1 cache is now 32+32 instead of 64+64. I have confirmed this with AMD. The slide also has a typo that I confirmed with AMD, it should read 32KB.


RE: L1 Cache
By jmnewton on 6/1/2006 4:19:33 PM , Rating: 2
Umm.. it says "32B instruction *fetch*", not 32B instruction *cache*. This would mean 32 bytes of data are fetched per cycle from the ICache to be decoded (x86 instructions are variable width and so 32B is any number of actual instructions). It may actually be 32KB ICache, but if that is what is meant by that line on the slide - then the wording is wrong too. Otherwise, the wording is completely correct.


Moderated
By Knish on 6/1/06, Rating: -1
RE: L1 Cache
By KristopherKubicki (blog) on 6/1/2006 4:23:28 PM , Rating: 2
Yeah, my apologies. When I contacted AMD originally they refered to it as a typo but just recanted. It does have 32KB of ICache though.


RE: L1 Cache
By Ecmaster76 on 6/1/2006 4:49:37 PM , Rating: 2
I guess with quad core they cut back all the cache to free up die space. Thats a shame.

But then again, maybe they have compensated with a better front end on that thing.


RE: L1 Cache
By saratoga on 6/1/2006 5:10:58 PM , Rating: 2
128k was really large for an L1. It was designed for the days when L2 caches were offdie, not for modern on die L2 (and now L3!), so it was probably outdated.

A smaller L1 allows for faster access, wider data path, and/or better associativity for a given die size.


RE: L1 Cache
By inthell on 6/1/2006 7:01:17 PM , Rating: 2
lower L1 cache means BETTER cache latency


RE: L1 Cache
By anoninsider on 6/2/2006 5:15:50 PM , Rating: 2
No, the cache is 128KB just like in the K8 and K7. See
http://www.realworldtech.com/page.cfm?ArticleID=RW...

Besides, it makes no sense that they would change the L1 caches. That would require them to redo the entire memory subsystem: prefetch, line size, L2, L3, memory bandwidth, etc.

According to the author, an AMD architect confirmed that the caches are 128KB, and the floor plans also do.

Anon


RE: L1 Cache
By Scrogneugneu on 6/4/2006 10:20:04 PM , Rating: 2
If AMD did cut the L1 in half, it might be to save money. Just think about it, L1 cache is made from very low latency memory, and has a high production cost. If you include half as much per core, you save a lot of money, even more if you consider that the total number of cores is going up.

My guess : since you have 2 (or more) cores working, they should all have less work than a single core. Thus, they should all need less L1, as the little time they lose with a L1 cache miss is taken up by the fact that the core has half as much work to do.

One core needs to be really fast to emulate a high number of real-time calculations, but if you have 2, then to offer the same performance, they need to be only half as fast in theory. Therefore, a reduction in L1 size coupled with twice as much cores available enables taking up twice as much workload, without any huge increase in production cost.


RE: L1 Cache
By Tyler 86 on 6/5/2006 12:09:01 AM , Rating: 2
I love how the rumor buster statement is so vague..

"... the contrary at Daily Tech, the L1D and L1I caches remain at 64KB each, according to a senior architect at AMD."
... 64KB? That's half of 128KB.
Does it mean 32KB+32KB = 64KB?
Does it mean 64KB+64KB = 128KB?

Back to square one.


RE: L1 Cache
By Tyler 86 on 6/5/2006 12:09:53 AM , Rating: 2
Nah I guess it's not so vague. 64KB + 64KB it is. I hope.


RE: L1 Cache
By anoninsider on 6/6/2006 2:21:13 PM , Rating: 2
When architects talk about caches, they always talk about data and instruction cache separately. I and D cache are just totally separate beasts, for a variety of reasons.

The sentence is crystal clear, L1I and L1D cache are 64KB each. 64KB+64KB = 128KB.

The K8L will have 64/64 D/I caches. The total is 128K, but how it is split is more important than the total.


As predicted
By mushi799 on 6/1/2006 3:11:55 PM , Rating: 2
AMD releases an annoucement a month or so before Intel releases their next cpu cores.

This annoucement was design to make some ppl wait. After reading these comments, it probably worked.




RE: As predicted
By TomZ on 6/1/2006 3:15:43 PM , Rating: 2
AMD fans will always wait for AMD, regardless of what Intel is doing.


RE: As predicted
By proamerica on 6/1/2006 3:35:47 PM , Rating: 2
"Good things come to those who wait."


RE: As predicted
By PT2006 on 6/1/2006 3:42:09 PM , Rating: 1
Like AM2 right?


RE: As predicted
By peternelson on 6/1/2006 5:15:27 PM , Rating: 1
Yes, like AM2 "4x4" dual sockets of dualcore goodness ;-)


RE: As predicted
By Fenixgoon on 6/1/2006 7:04:48 PM , Rating: 2
it's no different than waiting for conroe over AM2.. now you can wait for K8L over conroe over AM2 :P

regardless, i think both K8L and conroe will absolutely rock, which means we win as consumers :)


RE: As predicted
By Torched on 6/2/2006 12:37:06 PM , Rating: 2
quote:
Like AM2 right?


AM2 is not an architecture, its an interface. What you mean to say is Rev. F. The processor discussed in this article is Rev. G and uses the Socket F interface.


RE: As predicted
By KristopherKubicki (blog) on 6/2/2006 4:32:10 PM , Rating: 2
K8L is something after Revision G, which several websites are already starting to call Revision H. Revision G is for the most part a 65nm SOI revision F chip.

http://dailytech.com/article.aspx?newsid=2489


RE: As predicted
By mlittl3 on 6/1/2006 4:07:26 PM , Rating: 2
This is not about AMD fans. This is not about gamers sitting in their moms basements trying to get a few more fps on Doom3. This is not about showing longer bars on bar graphs at hardware review sites. These things might seem important to those of us who buy a new desktop every 6 months and we only have to worry about supporting that one desktop.

This "new architecture" or whatever you want to call it is about multi-million dollar datacenters, genomic computation clusters, physical chemistry computation centers, etc. These kinds of people need well designed CPUs to handle tasks ranging from mapping the human genome to managing huge databases for financial institutions. They only upgrade when funds allow and they require strict, strict validation of the hardware. These kinds of things take years sometimes.

So leave your mom's basement. Walk down to your local bank and ask them are they an AMD or Intel fanboi or are they interested in extremely stable, high IPC server/workstation platforms that are future proof.


RE: As predicted
By Spoelie on 6/1/2006 4:26:21 PM , Rating: 2
IPC implies instruction per clockcycle, think that the final performance is a bit more important than what it can do in 1 clockcycle.


RE: As predicted
By Griswold on 6/1/2006 4:40:40 PM , Rating: 4
You mean like Intel benchmarking a product 6 months before it hits the streets versus a technology that has been on the market for (lets be generous and only talk about dual cores here) 10 months?

You sure sound like somebody who feels like AMD will rain on your parade next year. Then again, this is mainly server stuff.. does that really matter to you?


Bright future
By Griswold on 6/1/2006 2:47:38 PM , Rating: 2
It does indeed look good on the server front for AMD, despite the doomsayers who scream struggle&demise for AMD after each Intel announcement.

Scalability is where it's at and AMD got that nailed well from the beginning.

Though, calling K8L a new architecture is stretching it a bit, I think. They add a slew of architectural improvemets but it will still be a K8 - on steroids though.




RE: Bright future
By mountcarlmore on 6/1/2006 3:09:21 PM , Rating: 2
you can say that the k8 is k7 on steroids, you can say conroe is the p6 on steroids. If you expect a company to just outright change every single part of the microprocessor, thats not very realistic with the time frames theyre working with.


RE: Bright future
By Griswold on 6/1/2006 4:35:50 PM , Rating: 2
Well yes I agree with you to some extend. But I still dont think calling this a new architecture is really the right thing to do. The changes from K7 to K8 were bigger. IMC, AMD64, HTT to name a few.

K8L on the other hand, seems to mostly make existing things "fatter and wider". Granted, L3 is a new.

Anyway, if it delivers, who cares what its called.



RE: Bright future
By saratoga on 6/1/2006 5:15:57 PM , Rating: 2
quote:
The changes from K7 to K8 were bigger. IMC, AMD64, HTT to name a few.


ISA changes are not really arch. Though the address space and int changes were.

But anyway, the K8L will have a new HT and IMC design, not to mention ISA extensions anyway, so thats not really a valid example.

Furthermore, K7 to K8 was mostly just front end changes. AMD left the K7 backend mostly untouched (aside from the wider ALU and other x86-64 changes). The K8L will have upgraded front end (all 3 levels of cache redesigned, IF upgraded, more aggressive OOOE) and backend parts (massively improved execution resources). IMO this is at least as big a change as the K8. Possibly a much larger one, though the details still aren't clear.


RE: Bright future
By Regs on 6/8/2006 9:33:38 AM , Rating: 2
quote:

The changes from K7 to K8 were bigger. IMC, AMD64, HTT to name a few.


Bigger yes, but IPC was just not there. Hopefully the K8L will solve this with lower latency in it's L2 cache and a smarter, more accurate prefetch. I have no idea what they did to the core itself yet as it's not as easy to assume and squeeze information about the registry units.


RE: Bright future
By Garreye on 6/1/2006 7:10:08 PM , Rating: 2
it is being called K8L and not K9, so that is a reflection of the idea of K8 on steroids....


RE: Bright future
By lifeblood on 6/2/2006 10:17:06 AM , Rating: 1
It seems the information in this article is more oriented toward the server arena. But what about the K8L core?

Intel was holding the industry back with its lack of innovations and dead microburst architecture. AMD came out with a clearly superior line of products and gave Intel a 1-2-3 punch (dual core, 64 bit, lawsuit). Intel, not one to tolerate that kind of slap in the face, went back to the labs and has finally brought forth a winning product (Conroe et al). Now what is AMD going to respond with? The interconnects between cores is critical in the server arena but the desktop is still primarily single threaded. Most office productivity apps and games get very little benefit from multi core (although that is changing more every day). At the heart of the processor is the core. What is AMD doing to the K8L core to make it better than the Conroe core? Or are we going to have a situation where AMD has the better interconnects but Intel has the better core?


RE: Bright future
By Regs on 6/2/2006 12:47:53 PM , Rating: 2
Server and Workstation technology is the most demanding and likely the most sophisticated. Desk top users most demanding applications are encoding programs and 3D imaging including games. The K8L will offer a much needed performance boost to all mediums including the DeskTop.


1207
By Discord on 6/1/2006 5:36:59 PM , Rating: 2
I'm just incredibly curious about the 1207 pin socket F. Everyone is saying that it will only support the F (and G) revisions of the Opteron, which would basically be a DDR2 version of present Opteron chips.
It just doesn't add up. To add DDR2 support to the present socket 939 based Athlons, it took one pin. Even if they did somethin major to the F revision of the Opteron, such as adding a fourth HT channel, it wouldn't explain so many extra pins. Add the facts that they are going to be pin less CPU's and in all the documentation mentioning socket F, AMD calls it its next generation server socket. This is a radical divergence from the AM2 socket.
Many people chalk up the extra pins of the 1207 as supporting 4 cores. Extra cores do not require extra interface pins in the socket. That is unless each core is going to have its own memory or HT link?
We've also been hearing about the 1207 for over a year now. With K8l chips due out in less than a year we haven't heard anything about their new sockets.
Something is fishy here. I'm guessing that either the 1207 will support both standard K8 and K8ls, or that it is going to support only K8ls. Maybe they’re not coming out with DDR2 revised Opterons? Maybe they have an Opteron version of the AM2?
What's is everyone else's thoughts?




RE: 1207
By PaulDriver on 6/1/2006 7:11:43 PM , Rating: 2
Methinks that the extra 267 pins (pads?) have to do with FB-DIMM support.

p.d.


RE: 1207
By saratoga on 6/1/2006 9:07:53 PM , Rating: 2
quote:
It just doesn't add up. To add DDR2 support to the present socket 939 based Athlons, it took one pin.


It took a lot more then that, just S939 had tons of unused pins for some reason. IIRC it only used something like 700 of them, the rest weren't connected.

Anyway 1207 might be forwards compatable with HT3 or FBDIMMs, which would use up a LOT of extra pins. Or it could just have a lot of useless pins, like 939.


cache
By hwhacker on 6/1/2006 9:35:56 PM , Rating: 2
Background info for impending question:

Ok, so initial K8L's will have 2mb L3. AMD has said that they plan to increase it. The Z-ram rumours that were flying something-fierce proposed AMD planned to inbed 5mb Z-ram on their processors. While Greyhound for the desktop socket (let's call it) "AM3" seems to be the first processors for it's sector using K8L tech, Cadiz is also on the roadmap with the same main specs, only in the higher-end "workstation" sector. This section is that which includes desktop Opterons, something notibley currently absent from the AM2 platform.

Here's my questions are this:

Will the initial K8L's have Z-ram for L3, or will be it SRAM with perhaps a transition to Z-RAM? Perhaps the kicker question is...Is Cadiz planned to have the rumoured 5mb of L3 cache, perhaps used in high-end "AM3" products such as later FX's and/or desktop Opterons? That would be sweet.

Anyone...Kris...Bueler? ;)
TIA.


RE: cache
By SocketF on 6/2/2006 6:36:30 AM , Rating: 2
quote:
Will the initial K8L's have Z-ram for L3, or will be it SRAM with perhaps a transition to Z-RAM?
If you look at the K8L Die photo, together with the information that it is ~2 MB L3, then it must be normal S-RAM, because the L3 area is to big for being 2 MB L3 with Z-RAM.
It might be possible to use the same area for Z-RAM(if it can keep up the same timings), but then it would be around 10 MB L3, as Z-RAM is 5 times denser than S-RAM ;-)

cheers

SocketF


RE: 1207
By Tyler 86 on 6/2/2006 3:41:35 AM , Rating: 2
The first poster hinted at it, but yes, it's Socket F, for "Fishy". Please, continue speculation. Hrm.

After all, more pins must mean more something, right? ...
Maybe they make the chip look prettier, ala S939?

Here's hoping they use the god forsaken things.


RE: 1207
By SocketF on 6/2/2006 6:31:45 AM , Rating: 2
Hi,

have a look at the AMD slides. They mentioned a core with 4 HT and/or some "I/O". I/O in my opinion means PCIe and there were already some (inquirer) rumours long ago, that SocketF will get a PCIe connector. Support for FB-Dimm is likely, too.
Anyways, SocketF is to be annonced on July 11th. Furthermore, more information should be available during Computex, next week, so just wait a little bit longer :)

cheers

SocketF

P.S: http://www.theinquirer.net/?article=31787


K8L is not a great performer...
By photoguy99 on 6/1/2006 4:20:02 PM , Rating: 3
...Until they can show some benchmarks.

Go ahead call me an Intel fanboy - But I was cheering AMD to everyone who would listen when they had the best benchmarks. In fact 4 Months ago I bought an AMD system for myself.

Now it looks like the Conroe benchmarks are going to be for real so in a month I'll start using Conroe server/desktops.

AMD gives tech info in advance? Who cares. Intel gives info in advance? Who cares.

Until you they can show some benchmarks K8L is damn marketing hype.




By firewolfsm on 6/1/2006 4:33:16 PM , Rating: 2
Maybe, but we know these will bring improvements...just not how much. With everything they've done, I think it'll at least come near. But they should scale higher at the same time so 65nm SOI K8L is looking good.


RE: K8L is not a great performer...
By bob661 on 6/1/2006 7:41:34 PM , Rating: 2
quote:
Until you they can show some benchmarks K8L is damn marketing hype.
It's not marketing hype it's called AMD's Technology Analyst Day. Doesn't sound like a press release to me.


By photoguy99 on 6/1/2006 8:54:35 PM , Rating: 2
quote:
It's not marketing hype it's called AMD's Technology Analyst Day. Doesn't sound like a press release to me.


Hello McFly? Analysts don't buy anything - AMD's only use for analysts is to convince people their future is bright.

Believe me, their best marketing people are intimately involved with this event.


How about L1<->L2 bus width?
By YuryMalich on 6/2/2006 1:53:22 AM , Rating: 1
Is there any information about width of L1<->L2 bus in K8L?
Modern K8s have two 64-bit unidirectional buses between L1 and L2 caches (one for reading data from L2 another for writing evicted data to L2). 64+64 seems to be too narrow to feed two 128-bit FPUs.




RE: How about L1<->L2 bus width?
By saratoga on 6/2/2006 4:21:53 AM , Rating: 2
It says 2x 128 bit loads per clock and 2x128 bit L1 buses, which implies a 256 bit L2 link. They also appear to have redone all levels of cache, so if it were ever to happen, now would be the time.

Though I guess its possible they left the 128 bit L2 bus and are betting that the L1 can keep the SSE units fed, which might be reasonable. Sustaining 2x loads every single clock cycle is pretty unlikely in most code. You'll have other operations besides loads, and the compiler can use them to give the L2 a chance to catch up.


Cool Stuff
By hmurchison on 6/1/2006 2:29:59 PM , Rating: 2
It'll be interesting to see how the tiered caching of the K8L based products do against the larger caches of Core2 product.

L3 ondie and more HT links. Not too shabby I love the thought of 32 fully connected cores. This has Virtualization written all over it. Datacenters rejoice.




Interesting
By AstroCreep on 6/1/2006 2:32:24 PM , Rating: 2
The new announcements are very interesting and seemingly have promise.
I'm crossing my fingers that AMD doesn't try to rush K8L to combat Intel's ambitious new offerings.




Awesome
By fbrdphreak on 6/1/2006 2:32:55 PM , Rating: 2
Wow, this is exciting stuff. The expandability of the HT links is amazing. Keep it up AMD!




"Die shot"
By ShapeGSX on 6/2/2006 9:16:33 AM , Rating: 2
"AMD's K8L die-shot and main points"

That doesn't look like a die shot to me. More like a screen shot of the processor layout.

A die shot would indicate that it was a picture taken of actual silicon.




Beware ...
By crystal clear on 6/4/06, Rating: 0
By Regs on 6/1/2006 4:01:47 PM , Rating: 1
^ Nice point.


By Regs on 6/1/2006 4:02:47 PM , Rating: 1
Even though I think your being moddest about the 2 years.


"We basically took a look at this situation and said, this is bullshit." -- Newegg Chief Legal Officer Lee Cheng's take on patent troll Soverain

Related Articles
Recent AMD Retractions
July 6, 2006, 1:25 PM
Gearing Up For AMD Revision G
May 24, 2006, 5:35 AM
HyperTransport 3.0 Ratified Today
April 24, 2006, 12:45 PM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki