backtop


Print 55 comment(s) - last by Regs.. on Jun 8 at 9:33 AM


AMD's K8L diagram

AMD's K8L cache design

Four 16-bit or Eight 8-bit HyperTransport Links

K8L's instruction dispatch diagram
K8L details continue to pour in at AMD's Technology Analyst Day

During the AMD Technology Analyst Day, AMD’s CTO Phil Hester rehashed the majority of the K8L information we discussed on DailyTech several days ago, but disclosed further details on specifics.  Hester was very specific to refer to these new technologies as simply “new architecture,” and never using the K8L core name. 

Internally, AMD engineers use the codename Greyhound to describe the "new architecture."

A major push for AMD’s K8L design is in “modular” component design – meaning everything from L3 cache to memory controllers are developed as individual components and linked together with reusable, robust designs.  To some extent, processor design is already modular with libraries and designs that are developed individually.  However, Hester insists this new modular approach takes this modular approach even further, claiming that the company is developing “better define the interfaces for each of these building blocks.”

Additionally, Hester revealed some more information about the cache specifics on K8L.  Each K8L core will have 64KB of dedicated L1 cache, followed by 512KB of dedicated L2 cache.  The base models of K8L will have 2MB of shared L3 cache, but Hester also went on to claim that adding more L3 cache was in the company’s roadmap.  One thing AMD representatives have not particularly touched on is the cache reduction from 64+64KB (data+instruction) to 32+32KB.  AMD employees have assured us this move is logical with the addition of L3 cache. 

A major feature of K8L is DICE, or Dynamic Independent Core Engagement.  Essentially, the ACPI layer will have the ability to dynamically adjust individual cores and crossbars on the processor.   Each processor core will have the ability to enter its own power-state, or p-state, allowing a K8L processor the ability to conserve power when the system does not have enough threads to utilize the other processor cores.  Intel’s Core processors have the ability to enter c-states on a per-core basis, but the AMD demonstration showed a quad-core K8L processor dip individual cores into full halt.

Opteron servers right now are, for the most part, limited to PCs with eight sockets or less.  Part of this is due to the fact that each processor has only three HyperTransport links.  Hester announced that the next generation Opteron core will have four 16-bit HyperTransport-3 links running at 2.6GHz each.  These four links can reconfigure into eight 8-bit HyperTransport links in a process called “un-ganging,” which is a fundamental feature of HyperTransport-3.  Essentially, one could have an eight-socket server with thirty-two fully connected cores.  Each processor will be able to take advantage of any of the eight memory banks within one memory hop.  The HyperTransport-3 specification claims un-ganging mode can work on the fly, meaning that even a fully connected eight-socket server could dynamically change two 8-bit links into a single 16-bit link during operation to increase I/O at critical moments. 

K8L is designed as a native quad-core design, although slides from the Technology Analyst Day also revealed that a dual-core desktop SKU will appear in mid-2007.  So far, DailyTech has touched a little bit on the 65nm quad-core code names announced in AMD roadmaps, but to our knowledge the code names for dual-core K8L processors have not been disclosed.

Update 07/06/2006:
  Please read the update to this article about the K8L L1 cache sizes.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

RE: L1 Cache
By jmnewton on 6/1/2006 4:19:33 PM , Rating: 2
Umm.. it says "32B instruction *fetch*", not 32B instruction *cache*. This would mean 32 bytes of data are fetched per cycle from the ICache to be decoded (x86 instructions are variable width and so 32B is any number of actual instructions). It may actually be 32KB ICache, but if that is what is meant by that line on the slide - then the wording is wrong too. Otherwise, the wording is completely correct.


Moderated
By Knish on 6/1/06, Rating: -1
RE: L1 Cache
By KristopherKubicki (blog) on 6/1/2006 4:23:28 PM , Rating: 2
Yeah, my apologies. When I contacted AMD originally they refered to it as a typo but just recanted. It does have 32KB of ICache though.


RE: L1 Cache
By Ecmaster76 on 6/1/2006 4:49:37 PM , Rating: 2
I guess with quad core they cut back all the cache to free up die space. Thats a shame.

But then again, maybe they have compensated with a better front end on that thing.


RE: L1 Cache
By saratoga on 6/1/2006 5:10:58 PM , Rating: 2
128k was really large for an L1. It was designed for the days when L2 caches were offdie, not for modern on die L2 (and now L3!), so it was probably outdated.

A smaller L1 allows for faster access, wider data path, and/or better associativity for a given die size.


RE: L1 Cache
By inthell on 6/1/2006 7:01:17 PM , Rating: 2
lower L1 cache means BETTER cache latency


"We basically took a look at this situation and said, this is bullshit." -- Newegg Chief Legal Officer Lee Cheng's take on patent troll Soverain

Related Articles
Recent AMD Retractions
July 6, 2006, 1:25 PM
Gearing Up For AMD Revision G
May 24, 2006, 5:35 AM
HyperTransport 3.0 Ratified Today
April 24, 2006, 12:45 PM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki