Print 55 comment(s) - last by Regs.. on Jun 8 at 9:33 AM

AMD's K8L diagram

AMD's K8L cache design

Four 16-bit or Eight 8-bit HyperTransport Links

K8L's instruction dispatch diagram
K8L details continue to pour in at AMD's Technology Analyst Day

During the AMD Technology Analyst Day, AMD’s CTO Phil Hester rehashed the majority of the K8L information we discussed on DailyTech several days ago, but disclosed further details on specifics.  Hester was very specific to refer to these new technologies as simply “new architecture,” and never using the K8L core name. 

Internally, AMD engineers use the codename Greyhound to describe the "new architecture."

A major push for AMD’s K8L design is in “modular” component design – meaning everything from L3 cache to memory controllers are developed as individual components and linked together with reusable, robust designs.  To some extent, processor design is already modular with libraries and designs that are developed individually.  However, Hester insists this new modular approach takes this modular approach even further, claiming that the company is developing “better define the interfaces for each of these building blocks.”

Additionally, Hester revealed some more information about the cache specifics on K8L.  Each K8L core will have 64KB of dedicated L1 cache, followed by 512KB of dedicated L2 cache.  The base models of K8L will have 2MB of shared L3 cache, but Hester also went on to claim that adding more L3 cache was in the company’s roadmap.  One thing AMD representatives have not particularly touched on is the cache reduction from 64+64KB (data+instruction) to 32+32KB.  AMD employees have assured us this move is logical with the addition of L3 cache. 

A major feature of K8L is DICE, or Dynamic Independent Core Engagement.  Essentially, the ACPI layer will have the ability to dynamically adjust individual cores and crossbars on the processor.   Each processor core will have the ability to enter its own power-state, or p-state, allowing a K8L processor the ability to conserve power when the system does not have enough threads to utilize the other processor cores.  Intel’s Core processors have the ability to enter c-states on a per-core basis, but the AMD demonstration showed a quad-core K8L processor dip individual cores into full halt.

Opteron servers right now are, for the most part, limited to PCs with eight sockets or less.  Part of this is due to the fact that each processor has only three HyperTransport links.  Hester announced that the next generation Opteron core will have four 16-bit HyperTransport-3 links running at 2.6GHz each.  These four links can reconfigure into eight 8-bit HyperTransport links in a process called “un-ganging,” which is a fundamental feature of HyperTransport-3.  Essentially, one could have an eight-socket server with thirty-two fully connected cores.  Each processor will be able to take advantage of any of the eight memory banks within one memory hop.  The HyperTransport-3 specification claims un-ganging mode can work on the fly, meaning that even a fully connected eight-socket server could dynamically change two 8-bit links into a single 16-bit link during operation to increase I/O at critical moments. 

K8L is designed as a native quad-core design, although slides from the Technology Analyst Day also revealed that a dual-core desktop SKU will appear in mid-2007.  So far, DailyTech has touched a little bit on the 65nm quad-core code names announced in AMD roadmaps, but to our knowledge the code names for dual-core K8L processors have not been disclosed.

Update 07/06/2006:
  Please read the update to this article about the K8L L1 cache sizes.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: L1 Cache
By KristopherKubicki on 6/1/2006 3:57:30 PM , Rating: 2
Correct, the L1 cache is now 32+32 instead of 64+64. I have confirmed this with AMD. The slide also has a typo that I confirmed with AMD, it should read 32KB.

RE: L1 Cache
By jmnewton on 6/1/2006 4:19:33 PM , Rating: 2
Umm.. it says "32B instruction *fetch*", not 32B instruction *cache*. This would mean 32 bytes of data are fetched per cycle from the ICache to be decoded (x86 instructions are variable width and so 32B is any number of actual instructions). It may actually be 32KB ICache, but if that is what is meant by that line on the slide - then the wording is wrong too. Otherwise, the wording is completely correct.

By Knish on 6/1/06, Rating: -1
RE: L1 Cache
By KristopherKubicki on 6/1/2006 4:23:28 PM , Rating: 2
Yeah, my apologies. When I contacted AMD originally they refered to it as a typo but just recanted. It does have 32KB of ICache though.

RE: L1 Cache
By Ecmaster76 on 6/1/2006 4:49:37 PM , Rating: 2
I guess with quad core they cut back all the cache to free up die space. Thats a shame.

But then again, maybe they have compensated with a better front end on that thing.

RE: L1 Cache
By saratoga on 6/1/2006 5:10:58 PM , Rating: 2
128k was really large for an L1. It was designed for the days when L2 caches were offdie, not for modern on die L2 (and now L3!), so it was probably outdated.

A smaller L1 allows for faster access, wider data path, and/or better associativity for a given die size.

RE: L1 Cache
By inthell on 6/1/2006 7:01:17 PM , Rating: 2
lower L1 cache means BETTER cache latency

RE: L1 Cache
By anoninsider on 6/2/2006 5:15:50 PM , Rating: 2
No, the cache is 128KB just like in the K8 and K7. See

Besides, it makes no sense that they would change the L1 caches. That would require them to redo the entire memory subsystem: prefetch, line size, L2, L3, memory bandwidth, etc.

According to the author, an AMD architect confirmed that the caches are 128KB, and the floor plans also do.


RE: L1 Cache
By Scrogneugneu on 6/4/2006 10:20:04 PM , Rating: 2
If AMD did cut the L1 in half, it might be to save money. Just think about it, L1 cache is made from very low latency memory, and has a high production cost. If you include half as much per core, you save a lot of money, even more if you consider that the total number of cores is going up.

My guess : since you have 2 (or more) cores working, they should all have less work than a single core. Thus, they should all need less L1, as the little time they lose with a L1 cache miss is taken up by the fact that the core has half as much work to do.

One core needs to be really fast to emulate a high number of real-time calculations, but if you have 2, then to offer the same performance, they need to be only half as fast in theory. Therefore, a reduction in L1 size coupled with twice as much cores available enables taking up twice as much workload, without any huge increase in production cost.

RE: L1 Cache
By Tyler 86 on 6/5/2006 12:09:01 AM , Rating: 2
I love how the rumor buster statement is so vague..

"... the contrary at Daily Tech, the L1D and L1I caches remain at 64KB each, according to a senior architect at AMD."
... 64KB? That's half of 128KB.
Does it mean 32KB+32KB = 64KB?
Does it mean 64KB+64KB = 128KB?

Back to square one.

RE: L1 Cache
By Tyler 86 on 6/5/2006 12:09:53 AM , Rating: 2
Nah I guess it's not so vague. 64KB + 64KB it is. I hope.

RE: L1 Cache
By anoninsider on 6/6/2006 2:21:13 PM , Rating: 2
When architects talk about caches, they always talk about data and instruction cache separately. I and D cache are just totally separate beasts, for a variety of reasons.

The sentence is crystal clear, L1I and L1D cache are 64KB each. 64KB+64KB = 128KB.

The K8L will have 64/64 D/I caches. The total is 128K, but how it is split is more important than the total.

"We are going to continue to work with them to make sure they understand the reality of the Internet.  A lot of these people don't have Ph.Ds, and they don't have a degree in computer science." -- RIM co-CEO Michael Lazaridis
Related Articles
Recent AMD Retractions
July 6, 2006, 1:25 PM
Gearing Up For AMD Revision G
May 24, 2006, 5:35 AM
HyperTransport 3.0 Ratified Today
April 24, 2006, 12:45 PM

Most Popular ArticlesSmartphone Screen Protectors – What To Look For
September 21, 2016, 9:33 AM
UN Meeting to Tackle Antimicrobial Resistance
September 21, 2016, 9:52 AM
Walmart may get "Robot Shopping Carts?"
September 17, 2016, 6:01 AM
5 Cases for iPhone 7 and 7 iPhone Plus
September 18, 2016, 10:08 AM
Update: Problem-Free Galaxy Note7s CPSC Approved
September 22, 2016, 5:30 AM

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki