backtop


Print 55 comment(s) - last by Regs.. on Jun 8 at 9:33 AM


AMD's K8L diagram

AMD's K8L cache design

Four 16-bit or Eight 8-bit HyperTransport Links

K8L's instruction dispatch diagram
K8L details continue to pour in at AMD's Technology Analyst Day

During the AMD Technology Analyst Day, AMD’s CTO Phil Hester rehashed the majority of the K8L information we discussed on DailyTech several days ago, but disclosed further details on specifics.  Hester was very specific to refer to these new technologies as simply “new architecture,” and never using the K8L core name. 

Internally, AMD engineers use the codename Greyhound to describe the "new architecture."

A major push for AMD’s K8L design is in “modular” component design – meaning everything from L3 cache to memory controllers are developed as individual components and linked together with reusable, robust designs.  To some extent, processor design is already modular with libraries and designs that are developed individually.  However, Hester insists this new modular approach takes this modular approach even further, claiming that the company is developing “better define the interfaces for each of these building blocks.”

Additionally, Hester revealed some more information about the cache specifics on K8L.  Each K8L core will have 64KB of dedicated L1 cache, followed by 512KB of dedicated L2 cache.  The base models of K8L will have 2MB of shared L3 cache, but Hester also went on to claim that adding more L3 cache was in the company’s roadmap.  One thing AMD representatives have not particularly touched on is the cache reduction from 64+64KB (data+instruction) to 32+32KB.  AMD employees have assured us this move is logical with the addition of L3 cache. 

A major feature of K8L is DICE, or Dynamic Independent Core Engagement.  Essentially, the ACPI layer will have the ability to dynamically adjust individual cores and crossbars on the processor.   Each processor core will have the ability to enter its own power-state, or p-state, allowing a K8L processor the ability to conserve power when the system does not have enough threads to utilize the other processor cores.  Intel’s Core processors have the ability to enter c-states on a per-core basis, but the AMD demonstration showed a quad-core K8L processor dip individual cores into full halt.

Opteron servers right now are, for the most part, limited to PCs with eight sockets or less.  Part of this is due to the fact that each processor has only three HyperTransport links.  Hester announced that the next generation Opteron core will have four 16-bit HyperTransport-3 links running at 2.6GHz each.  These four links can reconfigure into eight 8-bit HyperTransport links in a process called “un-ganging,” which is a fundamental feature of HyperTransport-3.  Essentially, one could have an eight-socket server with thirty-two fully connected cores.  Each processor will be able to take advantage of any of the eight memory banks within one memory hop.  The HyperTransport-3 specification claims un-ganging mode can work on the fly, meaning that even a fully connected eight-socket server could dynamically change two 8-bit links into a single 16-bit link during operation to increase I/O at critical moments. 

K8L is designed as a native quad-core design, although slides from the Technology Analyst Day also revealed that a dual-core desktop SKU will appear in mid-2007.  So far, DailyTech has touched a little bit on the 65nm quad-core code names announced in AMD roadmaps, but to our knowledge the code names for dual-core K8L processors have not been disclosed.

Update 07/06/2006:
  Please read the update to this article about the K8L L1 cache sizes.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

RE: L1 Cache
By Scrogneugneu on 6/4/2006 10:20:04 PM , Rating: 2
If AMD did cut the L1 in half, it might be to save money. Just think about it, L1 cache is made from very low latency memory, and has a high production cost. If you include half as much per core, you save a lot of money, even more if you consider that the total number of cores is going up.

My guess : since you have 2 (or more) cores working, they should all have less work than a single core. Thus, they should all need less L1, as the little time they lose with a L1 cache miss is taken up by the fact that the core has half as much work to do.

One core needs to be really fast to emulate a high number of real-time calculations, but if you have 2, then to offer the same performance, they need to be only half as fast in theory. Therefore, a reduction in L1 size coupled with twice as much cores available enables taking up twice as much workload, without any huge increase in production cost.


RE: L1 Cache
By Tyler 86 on 6/5/2006 12:09:01 AM , Rating: 2
I love how the rumor buster statement is so vague..

"... the contrary at Daily Tech, the L1D and L1I caches remain at 64KB each, according to a senior architect at AMD."
... 64KB? That's half of 128KB.
Does it mean 32KB+32KB = 64KB?
Does it mean 64KB+64KB = 128KB?

Back to square one.


RE: L1 Cache
By Tyler 86 on 6/5/2006 12:09:53 AM , Rating: 2
Nah I guess it's not so vague. 64KB + 64KB it is. I hope.


RE: L1 Cache
By anoninsider on 6/6/2006 2:21:13 PM , Rating: 2
When architects talk about caches, they always talk about data and instruction cache separately. I and D cache are just totally separate beasts, for a variety of reasons.

The sentence is crystal clear, L1I and L1D cache are 64KB each. 64KB+64KB = 128KB.

The K8L will have 64/64 D/I caches. The total is 128K, but how it is split is more important than the total.


"We don't know how to make a $500 computer that's not a piece of junk." -- Apple CEO Steve Jobs

Related Articles
Recent AMD Retractions
July 6, 2006, 1:25 PM
Gearing Up For AMD Revision G
May 24, 2006, 5:35 AM
HyperTransport 3.0 Ratified Today
April 24, 2006, 12:45 PM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki