Print 51 comment(s) - last by Tyler 86.. on Oct 3 at 1:16 PM

Paul Otellini holds up a wafer with 6400 cores
Intel promises teraflop chips within the next five years

Intel today announced that it has produced its first teraflop-on-a-chip.  The chip, essentially a prototype, was demonstrated when Intel CEO Paul Otellini showed off the wafer during this week's IDF conference opening keynote.

Each of the 80 processors on the wafer contain a die with eighty cores -- 6400 cores in total.  Each CPU has more than one terabyte per second of throughput between the CPU cores and the on-die SRAM. Otellini claims that this technology will be available within 5 years, putting it in line with the previously outlined Gesher family expected to ship in 2010. 

To put that into perspective, the fastest public supercomputer in 1996 was the ASCI Red which featured over 4,500 compute nodes using 200MHz Pentium Pro processors and was the first computer to break the 1 teraflops barrier.

Each of the individual CPUs runs at 3.1GHz in a very simple configuration.  These are far from production-ready processors and are mainly for demonstration purposes.  Each processor is also unique in the fact that the packaging is three dimensional.  The cache substrate is "stacked" directly underneath the FPUs, thus saving space and latency. 

The processors are just one component of Intel's Tera-Scale initiative -- a set of research projects geared to bringing multi-teraflop systems to the masses by 2010.  More objectives of this project, including software design, will be announced later during the Intel Developer Forum.

Intel also today announced the official name for its quad-core desktop and server CPU: the Core 2 Quad. As its name implies, the processor contains four cores and features a 1066MHz front-side bus. For benchmarks on the Core 2 Quad, you can check out DailyTech’s Kentsfield article from yesterday.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

By FITCamaro on 9/26/2006 4:05:48 PM , Rating: 2
Ok they have the wafer. But does any of them actually work is the question. And the heat output would be insane with 80 3.1GHz cores on one processor.

RE: Ok....
By OrSin on 9/26/2006 4:12:22 PM , Rating: 2
Yeah but considered the total size I sure we could come with with small nitrogen tank for it. IT about time they start putting memory on the die. You know how much faster chips would be if it never had to access ram.

RE: Ok....
By Hare on 9/26/2006 4:40:01 PM , Rating: 2
IT about time they start putting memory on the die.
This is nothing new. Chips have "always" had cache built-in. The current desktop chips have 1-4mb.

RE: Ok....
By lemonadesoda on 9/26/2006 6:15:52 PM , Rating: 5
For the Intel x86 family, "built in" cache was introduced on the 486. On a 386 system there was no cache, unless the MB manufacturer included a Intel®82385 32 bit Cache controller (an external chip) with a socket for the external cache memory.

I've still got a stick or two of the laying around somewhere...

RE: Ok....
By UNHchabo on 9/27/2006 10:50:17 AM , Rating: 2
Maybe that's when they added L2 cache, but I was pretty sure that processors have had cache of some sort for considerably longer than that.

RE: Ok....
By defter on 9/27/2006 12:29:32 PM , Rating: 2
No, 486 was when they added L1 cache. L2 cache was added in 1998 with Celeron-A. Before 486, Intel didn't have any cache within x86 CPUs.

RE: Ok....
By codeThug on 9/29/2006 10:39:11 PM , Rating: 2
They did. They call them REGISTERS.

RE: Ok....
By Tyler 86 on 10/1/2006 1:28:40 AM , Rating: 2
Eh.. Technicly, registers are not cache, since they're directly accessible...
Cache is indirectly accessible by accessing prefetched or previously accessed memory locations...

mov al, 0x4000
inc al

then al no longer equals 0x4000,

whereas, with cache...

you could technicly somehow directly set the cache line for 0x4000 w/o accessing memory, possibly by an out-of-order instruction optimization, eg;
mov bl, 0x4000
mov 0x4000, al
bl -> cached 0x4000 -> al -> 0x4000
but more optimized would be
bl -> al -> 0x4000

... but no intelligent compiler would create such a circumstance... so...

... registers aren't cache...

RE: Ok....
By Tyler 86 on 10/1/2006 1:29:19 AM , Rating: 2
nowadays, there's a couple of kB worth of L1 cache that aren't registers...

RE: Ok....
By Clauzii on 10/1/2006 7:24:39 AM , Rating: 2
"A cache was first used in a commercial computer in 1968"

From here:

RE: Ok....
By finalfan on 9/26/2006 4:43:04 PM , Rating: 2
I believe there is a working or sort of working chip otherwise the claim of 3.1G comes from nowhere.

RE: Ok....
By Patito on 9/26/2006 4:48:43 PM , Rating: 2
Be careful there. The text says 80 processors in a wafer, and each processor has 8 cores, NOT 80 cores in one processor.

RE: Ok....
By Ray 69 on 9/26/2006 5:19:45 PM , Rating: 2
Each of the 80 processors on the wafer contain a die with eighty cores -- 6400 cores in total.

Sure sounds like 80 cores/processor but you're probably correct as that seems rather excessive.

RE: Ok....
By swtethan on 9/26/2006 5:21:01 PM , Rating: 2
80 chips per wafer, 80 cores per chip 6400 in all

80 cores!!!

RE: Ok....
By TuxDave on 9/26/2006 7:16:49 PM , Rating: 2
'Each of the 80 processors on the wafer contain a die with eighty cores...'

It's very possible. Each of those dies are HUGE and combined with the fact that each core is very simple (no out-of-order blocks, etc...) it's very possible to get 80 cores on a single die.

RE: Ok....
By peternelson on 9/28/2006 2:49:47 AM , Rating: 2
Certainly is possible to put that many cores in a die, Clearspeed are doing and exceeding that already ;-)

RE: Ok....
By AnnihilatorX on 9/26/2006 8:13:43 PM , Rating: 2
Intel showed off a wafer of these teraflop chips, with a target clock speed of 3.1GHz and power consumption of about 1W per 10 gigaflops - or 100W for 1 TFLOP.

It's only 100W, quoted from the Fall IDF article

RE: Ok....
By Dactyl on 9/26/2006 10:32:23 PM , Rating: 2
the heat output would be insane with 80 3.1GHz cores on one processor.

Not necessarily. Not all cores are the same.

The two cores in Conroe, for instance, are both very complicated and designed to work as fast as possible on a single thread at a time. They have all sorts of optimizations (micro-ops fusions, macro-ops fusion, branch prediction, etc. etc.)

When you get a system with 80 cores, they are going to be very simple cores. Individually, each will be much slower than a single Conroe core. Together, they're capable of massive performance--if you can make an application or compiler that can take advantage of lots of mini-cores.

RE: Ok....
By Missing Ghost on 9/27/2006 12:33:34 AM , Rating: 2
-if you can make an application or compiler that can take advantage of lots of mini-cores.

a web server?

RE: Ok....
By Hypernova on 9/27/2006 6:05:52 AM , Rating: 2
Ultra Sparc T2? These chips suck at anything other then MIMO operations though.

RE: Ok....
By RogueSpear on 9/26/2006 11:00:51 PM , Rating: 3
You could always use the heat to boil water and use the steam for generating electricity. Maybe offset some of the power requirements :P

Sounds nice
By jazzboy on 9/26/2006 4:27:03 PM , Rating: 5
All sounds good, although I can't help but feel how similar this sounds to Intel's famous claim of "10Ghz Netburst CPUs by 2005".

Of course if things do go to plan then this'll be one hell of a chip when it comes out - assuming software has caught up by then.

RE: Sounds nice
By therealnickdanger on 9/26/2006 4:39:43 PM , Rating: 2
Didn't Intel O/C a Netburst in their lab to that speed? I'm not sure if it was 10GHz, but it was crazy fast...

RE: Sounds nice
By PedroDaGr8 on 9/26/2006 4:47:51 PM , Rating: 2
I LOVE when people quote things that aren't true. Hell things that have been known to be not true for over a year. That statement of 10Ghz in 2005 was debunked on THIS site a while back. If I remember correctly they even discussed the origins some.

RE: Sounds nice
By Clauzii on 9/26/2006 7:22:00 PM , Rating: 2
The right answer is this:

Intel had an ALU-part running 10GHz. NOT a whole CPU.

I also don't think this will be 80 equal cores. probably 4 or 8 normal ones, and some "CELL"-like stuff in the rest for fast FPU etc.

RE: Sounds nice
By Tyler 86 on 9/26/2006 8:07:14 PM , Rating: 2

I wonder what the architecture is like in each of these 80 cores...

Today on x86 architecture, we work with 8 GP registers, 8 FPU/MMX registers, and 8 SSE/XMM, (or, with amd64, 16/16/16) while IA-64 architectures have 128 GP registers, and then some...

They used tera '-flops', which indicates fl oating point o erations p er s econd... so I'm assuming each of those simple processors are low precision (maybe less than 32-bit precision, eg fp24, fp16, maybe less, & perhaps non-IEEE?) with maybe 2 internal registers and direct memory access? ... but at 3.1ghz...

If it can do a teraflop at 3.1ghz... that's ridiculous...
The "PS3 has been announced as having a theoretical 2.18 TFLOPS", but most probably believe that's crap...

ATi claims the X1900 architecture is capable of 554 GFLOPS (0.554 TFLOPS), and crossfire isn't exactly unheard of ... they run at ~600Mhz... so...

Still, extra dimensionally stacked wafers makes my mouth water... that's an easy gateway to high performance.

RE: Sounds nice
By JeffDM on 9/26/2006 9:30:19 PM , Rating: 2
A theoretical teraflop on an 80 core SIMD CPU running at 3.1GHz doesn't sound too ridiculous, especially if each core has two floating point SIMD units with two pieces of data per instruction.

RE: Sounds nice
By Tyler 86 on 9/26/2006 9:36:16 PM , Rating: 2
Sorry, not ridiculous in that it's possible... It's ridiculous that it doesn't do more, considering, theoreticly, that a 'single core' X1900 architecture GPU at ~1200Ghz, with appropriate memory bandwidth, can do over 1 teraflop...

RE: Sounds nice
By Clauzii on 9/28/2006 2:49:54 PM , Rating: 2
Yeah, my thought too.

Considering CELL w. 8 SPEs @ 3.2Ghz ~ theoreticly 240 GFlops - One TFlop for 80 cores seems like not so much :( (Per core anyway).

RE: Sounds nice
By Tyler 86 on 10/3/2006 1:16:20 PM , Rating: 2
Ah crap, I just realized, I said 1200 G hz, I meant 1200Mhz... but you guys knew that, right?

RE: Sounds nice
By trabpukcip on 10/2/2006 12:12:33 AM , Rating: 3
Doesn't an ALU in a P4 run at double the clockspeed?
A P4 has been overclocked to 6GHz before hence the ALU would be at 12GHz?

RE: Sounds nice
By Clauzii on 10/2/2006 3:16:38 PM , Rating: 2
Yes! The ALU has been doubleclocked the whole P4 line through. Which also means that the 10GHz ALU was no theory at all :)

Quake III @ 1.8million FPS!!!
By therealnickdanger on 9/26/2006 4:09:56 PM , Rating: 2
Seriously though, I can't wait to see it happen. We just need the rest of our computers to go faster...

RE: Quake III @ 1.8million FPS!!!
By Hare on 9/26/2006 4:46:44 PM , Rating: 2
The problem is that most every day applications wont benefit much from multiple cores. Apps aren't simply multi-threaded and there a lots of tasks that cannot be ran paraller. Scientific calculations, media encoding etc however will benefit greatly.

Example: Quake (games in general) are even more troublesome since they need a lot of synchronization of the tasks. Really effective multi-threading is difficult to create.

By therealnickdanger on 9/26/2006 5:07:57 PM , Rating: 2
The problem is that most every day applications wont benefit much from multiple cores...

I'm very aware of that, but since it is planned "for the masses" by 2010, I'm sure we'll have plenty of multi-threaded apps by then.

wouldnt it be like.... quake 5 or 6 by then?

Yeah, probably, but I'm just reminiscing about the days where Q3 was the de facto benchmark for CPUs...

RE: Quake III @ 1.8million FPS!!!
By swtethan on 9/26/2006 4:47:52 PM , Rating: 2
wouldnt it be like.... quake 5 or 6 by then?

RE: Quake III @ 1.8million FPS!!!
By fic2 on 9/26/2006 4:52:56 PM , Rating: 5
By then it will be Duke Nuke'm Forever! Oh, wait, probably not.

Okay so....
By archcommus on 9/26/2006 9:18:58 PM , Rating: 2
If Intel didn't exist, would the world of computing power be stagnant? Seems that way recently.

Kind of scary to think that the basic advances in computing power technology rest on one company...

RE: Okay so....
By Dactyl on 9/27/2006 1:24:52 AM , Rating: 2
If you think Intel is the only company innovating, you obviously aren't paying attention.

Recent Past:

AMD's relatively recent intro of native dual core
Sony/Toshiba/IBM's Cell processor
Sun's multithreaded Niagra CPUs
Blade servers from Sun, HP, IBM
Ageia PhysX card
FPGA accelerators that plug into AMD's architecture


AMD's Torrenza (includes aforementioned FPGAs and much, much more)
AMD's native quad core
attempts to tap the power of graphics cards for physics, folding@home, other floating point intensive operations
Things we don't yet know about


Faster memory
Increased capacity in hard drives
Cheaper flash memory
Better graphics cards
Faster connections (hypertransport, PCI express, etc. etc. etc.)

And of course this only touches on the hardware side of things. New computers are only useful if we can put them to good use. For innovation there, see Google, Sun, IBM, etc. etc. etc.

No innovation outside of Intel? That's the kind of thing a troll would say.

RE: Okay so....
By traxcore on 9/27/2006 5:13:14 AM , Rating: 2
Dont forget ATI, nVIDIA etc. all those firms develope "CPU's" also and their technology can be transfered to other kind of high performace chips aswell.

RE: Okay so....
By Tyler 86 on 9/27/2006 6:02:25 AM , Rating: 2
... huh? CPUs? You mean just some form of processor?

Well, don't forget, ATI == AMD now... Rock.

RE: Okay so....
By bobdeer1965 on 9/27/2006 4:01:38 PM , Rating: 2
If Intel didn't exist, would the world of computing power be stagnant? Seems that way recently.

8 months ago you probably said the same thing about AMD and now you jump the fence and take the Intel side as soon as they leapfrog AMD. The fact is AMD still has the better technology. That doesn't mean faster "yet" but it is superior. You need to read a little more and don't take sides so fast and don't wear blinders. I use AMD exclusively but I still give Intel credit for having the fastest processor right now. They just have the problem of resisting ALL technology that they didn't invent themselves. Like on die memory controller. And Hypertransport. Both superior technology. BUT they will change because AMD put them on notice for the last 3 years that the market won't stand for this kind of thinking.

RE: Okay so....
By MrDiSante on 9/27/2006 6:05:28 PM , Rating: 1
And AMD can have loads of fun scaling the HyperTransport to supporting 4, 8 and eventually 16 cores. To say that someone has better technology but is losing in terms of performance is silly. On paper netburst may have looked wonderful, however it didn't work all that well practice has shown. Intel messed up the last 5 years now it's time for AMD to seem the screw-up, it's all a perfectly natural cycle and there's no reason whatsoever that one shouldn't switch sides to whatever company has the better solution at the moment.

No one will ever need a teraflop-on-a-chip
By ts3nigma on 10/2/2006 1:39:35 AM , Rating: 3
No one will ever have a use for a teraflop-on-a-chip

By rushfan2006 on 10/2/2006 8:45:12 AM , Rating: 2

Quoted on 10/2/2006 : "No one will ever have a use for a teraflop-on-a-chip "

Fast forward to 10/2/2036: LOL remember when someone said no one will ever have a use for a teraflop-on-a-chip? :)

By ksherman on 9/26/2006 5:22:53 PM , Rating: 2
maybe im just being a nitpickyjerk... but the "little" picture used in the top left is the same pic from the Micron article posted yesterday


RE: heh
By AndreasM on 9/27/2006 12:28:40 PM , Rating: 2
That's probably the icon for news about manufacturing tech.

Multi Threaded
By wrack on 9/26/2006 7:28:59 PM , Rating: 2
Not sure about others but I am a developer and we have instructions to make any major app we develop to have a multi processor capability using multi threading.

Many companies I know are doing this and by the time these processors are very common most of the apps will be able to use them.

RE: Multi Threaded
By Spivonious on 9/27/2006 3:07:00 PM , Rating: 2
And even if the app itself is not multithreaded, you always have more than one thread running on Windows. Right now Task Manager tells me I'm running 554 threads in 43 processes. I imagine the system would be quite fast if each of those threads got 100% processor time in a 554 core system running a 554 core supporting OS. :)

By psychobriggsy on 9/27/2006 11:53:18 AM , Rating: 2
This is a similar development to Cell, however it is 4 or 5 years off. Certainly it seems a reaction to Cell, rather than a reaction to dual/quad x86 cores. It'll probably be used for graphics and physics processing in a system instead of a GPU.

Cell has 8 dual-pipelined SIMD cores (alongside the PowerPC core) and gets 1/4 of the speed of this Intel design. That suggests that the Intel core is single pipelined SIMD, or quad-pipeline single-precision. 80 * 3.1GHz * 4 = 992GFLOPS, near enough 1TFLOP.

Give Cell a process shrink to 65nm, and you can fit on 16 SPUs on a chip, running at 4GHz, for 16 * 4GHz * 8 = 512GFLOPS. However Intel's inter-core technology will probably scale better than Cell's ring topology.

Intel should stick a HyperTransport interface on this chip and sell it for use with AMD's Torrenza technology, it seems like an ideal match! :)

Needless to say the future of CPUs and GPUs is interesting again.

How much better is this?
By Assimilator87 on 9/27/2006 2:04:27 PM , Rating: 2
How many FLOPS can Core 2 and K8 do?

"Paying an extra $500 for a computer in this environment -- same piece of hardware -- paying $500 more to get a logo on it? I think that's a more challenging proposition for the average person than it used to be." -- Steve Ballmer
Related Articles
Intel "Kentsfield" Named Core 2 Quad
September 26, 2006, 1:53 PM
Intel "Kentsfield" Performance Explored
September 25, 2006, 6:35 PM
Intel Life After "Conroe"
June 20, 2006, 12:38 PM

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki