Print 51 comment(s) - last by Tyler 86.. on Oct 3 at 1:16 PM

Paul Otellini holds up a wafer with 6400 cores
Intel promises teraflop chips within the next five years

Intel today announced that it has produced its first teraflop-on-a-chip.  The chip, essentially a prototype, was demonstrated when Intel CEO Paul Otellini showed off the wafer during this week's IDF conference opening keynote.

Each of the 80 processors on the wafer contain a die with eighty cores -- 6400 cores in total.  Each CPU has more than one terabyte per second of throughput between the CPU cores and the on-die SRAM. Otellini claims that this technology will be available within 5 years, putting it in line with the previously outlined Gesher family expected to ship in 2010. 

To put that into perspective, the fastest public supercomputer in 1996 was the ASCI Red which featured over 4,500 compute nodes using 200MHz Pentium Pro processors and was the first computer to break the 1 teraflops barrier.

Each of the individual CPUs runs at 3.1GHz in a very simple configuration.  These are far from production-ready processors and are mainly for demonstration purposes.  Each processor is also unique in the fact that the packaging is three dimensional.  The cache substrate is "stacked" directly underneath the FPUs, thus saving space and latency. 

The processors are just one component of Intel's Tera-Scale initiative -- a set of research projects geared to bringing multi-teraflop systems to the masses by 2010.  More objectives of this project, including software design, will be announced later during the Intel Developer Forum.

Intel also today announced the official name for its quad-core desktop and server CPU: the Core 2 Quad. As its name implies, the processor contains four cores and features a 1066MHz front-side bus. For benchmarks on the Core 2 Quad, you can check out DailyTech’s Kentsfield article from yesterday.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

By FITCamaro on 9/26/2006 4:05:48 PM , Rating: 2
Ok they have the wafer. But does any of them actually work is the question. And the heat output would be insane with 80 3.1GHz cores on one processor.

RE: Ok....
By OrSin on 9/26/2006 4:12:22 PM , Rating: 2
Yeah but considered the total size I sure we could come with with small nitrogen tank for it. IT about time they start putting memory on the die. You know how much faster chips would be if it never had to access ram.

RE: Ok....
By Hare on 9/26/2006 4:40:01 PM , Rating: 2
IT about time they start putting memory on the die.
This is nothing new. Chips have "always" had cache built-in. The current desktop chips have 1-4mb.

RE: Ok....
By lemonadesoda on 9/26/2006 6:15:52 PM , Rating: 5
For the Intel x86 family, "built in" cache was introduced on the 486. On a 386 system there was no cache, unless the MB manufacturer included a Intel®82385 32 bit Cache controller (an external chip) with a socket for the external cache memory.

I've still got a stick or two of the laying around somewhere...

RE: Ok....
By UNHchabo on 9/27/2006 10:50:17 AM , Rating: 2
Maybe that's when they added L2 cache, but I was pretty sure that processors have had cache of some sort for considerably longer than that.

RE: Ok....
By defter on 9/27/2006 12:29:32 PM , Rating: 2
No, 486 was when they added L1 cache. L2 cache was added in 1998 with Celeron-A. Before 486, Intel didn't have any cache within x86 CPUs.

RE: Ok....
By codeThug on 9/29/2006 10:39:11 PM , Rating: 2
They did. They call them REGISTERS.

RE: Ok....
By Tyler 86 on 10/1/2006 1:28:40 AM , Rating: 2
Eh.. Technicly, registers are not cache, since they're directly accessible...
Cache is indirectly accessible by accessing prefetched or previously accessed memory locations...

mov al, 0x4000
inc al

then al no longer equals 0x4000,

whereas, with cache...

you could technicly somehow directly set the cache line for 0x4000 w/o accessing memory, possibly by an out-of-order instruction optimization, eg;
mov bl, 0x4000
mov 0x4000, al
bl -> cached 0x4000 -> al -> 0x4000
but more optimized would be
bl -> al -> 0x4000

... but no intelligent compiler would create such a circumstance... so...

... registers aren't cache...

RE: Ok....
By Tyler 86 on 10/1/2006 1:29:19 AM , Rating: 2
nowadays, there's a couple of kB worth of L1 cache that aren't registers...

RE: Ok....
By Clauzii on 10/1/2006 7:24:39 AM , Rating: 2
"A cache was first used in a commercial computer in 1968"

From here:

RE: Ok....
By finalfan on 9/26/2006 4:43:04 PM , Rating: 2
I believe there is a working or sort of working chip otherwise the claim of 3.1G comes from nowhere.

RE: Ok....
By Patito on 9/26/2006 4:48:43 PM , Rating: 2
Be careful there. The text says 80 processors in a wafer, and each processor has 8 cores, NOT 80 cores in one processor.

RE: Ok....
By Ray 69 on 9/26/2006 5:19:45 PM , Rating: 2
Each of the 80 processors on the wafer contain a die with eighty cores -- 6400 cores in total.

Sure sounds like 80 cores/processor but you're probably correct as that seems rather excessive.

RE: Ok....
By swtethan on 9/26/2006 5:21:01 PM , Rating: 2
80 chips per wafer, 80 cores per chip 6400 in all

80 cores!!!

RE: Ok....
By TuxDave on 9/26/2006 7:16:49 PM , Rating: 2
'Each of the 80 processors on the wafer contain a die with eighty cores...'

It's very possible. Each of those dies are HUGE and combined with the fact that each core is very simple (no out-of-order blocks, etc...) it's very possible to get 80 cores on a single die.

RE: Ok....
By peternelson on 9/28/2006 2:49:47 AM , Rating: 2
Certainly is possible to put that many cores in a die, Clearspeed are doing and exceeding that already ;-)

RE: Ok....
By AnnihilatorX on 9/26/2006 8:13:43 PM , Rating: 2
Intel showed off a wafer of these teraflop chips, with a target clock speed of 3.1GHz and power consumption of about 1W per 10 gigaflops - or 100W for 1 TFLOP.

It's only 100W, quoted from the Fall IDF article

RE: Ok....
By Dactyl on 9/26/2006 10:32:23 PM , Rating: 2
the heat output would be insane with 80 3.1GHz cores on one processor.

Not necessarily. Not all cores are the same.

The two cores in Conroe, for instance, are both very complicated and designed to work as fast as possible on a single thread at a time. They have all sorts of optimizations (micro-ops fusions, macro-ops fusion, branch prediction, etc. etc.)

When you get a system with 80 cores, they are going to be very simple cores. Individually, each will be much slower than a single Conroe core. Together, they're capable of massive performance--if you can make an application or compiler that can take advantage of lots of mini-cores.

RE: Ok....
By Missing Ghost on 9/27/2006 12:33:34 AM , Rating: 2
-if you can make an application or compiler that can take advantage of lots of mini-cores.

a web server?

RE: Ok....
By Hypernova on 9/27/2006 6:05:52 AM , Rating: 2
Ultra Sparc T2? These chips suck at anything other then MIMO operations though.

RE: Ok....
By RogueSpear on 9/26/2006 11:00:51 PM , Rating: 3
You could always use the heat to boil water and use the steam for generating electricity. Maybe offset some of the power requirements :P

"If a man really wants to make a million dollars, the best way would be to start his own religion." -- Scientology founder L. Ron. Hubbard
Related Articles
Intel "Kentsfield" Named Core 2 Quad
September 26, 2006, 1:53 PM
Intel "Kentsfield" Performance Explored
September 25, 2006, 6:35 PM
Intel Life After "Conroe"
June 20, 2006, 12:38 PM

Most Popular ArticlesSmartphone Screen Protectors – What To Look For
September 21, 2016, 9:33 AM
UN Meeting to Tackle Antimicrobial Resistance
September 21, 2016, 9:52 AM
Walmart may get "Robot Shopping Carts?"
September 17, 2016, 6:01 AM
5 Cases for iPhone 7 and 7 iPhone Plus
September 18, 2016, 10:08 AM
Update: Problem-Free Galaxy Note7s CPSC Approved
September 22, 2016, 5:30 AM

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki