backtop


Print E-mail del.icio.us 65 comment(s) - last by Sureshot324.. on May 28 at 11:59 AM


IBM's new POWER6 chip is a 64-bit, dual-core processor with 790 million transistors running at up to 4.7 GHz and 8 MB L2 cache

Cross section of a IBM POWER6, photographed using a scanning electron microscope, shows two transistors in gold
IBM claims to be launching the world's fastest chip for UNIX servers

IBM just launched the dual-core 64-bit POWER6 processor running at 4.7 GHz, which doubles the speed of the previous generation POWER5 while using nearly the same amount of electricity to run and cool it.

The POWER6 processor is a result of a five-year R&D period, is composed of 790 million transistors and is built using IBM’s 65nm process technology. IBM scientists targeted the way instructions are executed inside the chip to improve performance. For example, in the POWER6, the number of pipeline stages – the chunks of operations that must be completed in a single cycle of clock time – are kept static, but each stage is made faster, removing unnecessary work and doing more in parallel. As a result, execution time is reduced.

Earlier this year, IBM hinted that its new POWER6 architecture may hit frequencies higher than 5 GHz.

The POWER6 chip has a total cache size of 8 MB per chip – four times the POWER5 chip – to keep pace with the processor bandwidth. With 300 GB/s on tap, IBM boasts that its processor has so much bandwidth that the POWER6 chip could download the entire iTunes catalog in about 60 seconds. IBM believes that it has designed the POWER6 chip with a balanced amount of bandwidth and processing power.

“Like the victory of IBM’s Deep Blue chess-playing supercomputer 10 years ago this month, the debut of POWER6 processor-based systems proves that relentless innovation brings ‘impossible’ goals within reach,” said Bill Zeitler, senior vice president, IBM Systems and Technology Group. “The POWER6 processor forges blazing performance and energy conservation technologies into a single piece of silicon, driving unprecedented business value for our customers.”

To facilitate the lower energy demands of the new chip, the POWER6 designers separated circuits that can’t support low voltage operation onto their own power supply “rails,” allowing IBM to dramatically reduce power for the rest of the chip. IBM engineers also used a new method of chip design that enables POWER6 to operate at low voltages, allowing the same chip to be used in low power blade environments as well as large, high-performance symmetric multiprocessing machines.

In another design to reduce energy consumption and heat production, processor clocks can be dynamically turned off when there is no useful work to be done and turned back on when there are instructions to be executed. Also, the chip has configurable bandwidth, enabling customers to choose maximum performance or minimal cost.

Parts of the memory not being utilized are dynamically turned off and then turned back on when needed. In cases where an over-temperature condition is detected, the POWER6 chip can reduce the rate of instruction execution to remain within an acceptable, user-defined temperature envelope.

The chip is fast too, as a server built by IBM using the POWER6 architecture is the first ever to hold all four major benchmark speed records for business and technical performance. IBM says that its new 2- to 16-core server is multiple times faster than the HP Superdome or Itanium machines.   

The POWER6 chip is also aimed as being a midrange consolidation machine, containing special hardware and software that allows it to create many virtual servers on a single box. IBM calculates that 30 SunFire v890s can be consolidated into a single rack of the new IBM machine, saving more than $100,000 per year on energy costs.

IBM plans to introduce the POWER6 chip throughout the System p and System i server lines. The POWER6 chip in the new IBM System p 570 server is the first UNIX microprocessor able to calculate decimal floating point arithmetic in hardware. Until now, calculations involving decimal numbers with floating decimal points were done using software. The built-in decimal floating point capability gives tremendous advantage to enterprises running complex tax, financial and ERP programs.



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Power6 and C2D Benchmark comparison
By ralith on 5/22/2007 9:44:23 AM , Rating: 5
Man some people are so uptight, jeez. So C2D and Power6 won't compete in the same space. So what. They can still run the same benchmarks to see how badly the C2Ds and C2Qs get crushed by the Power6. If for no other reason than the amusement factor.




RE: Power6 and C2D Benchmark comparison
By Miggle on 5/22/2007 10:53:42 AM , Rating: 2
Exactly


RE: Power6 and C2D Benchmark comparison
By Ringold on 5/22/2007 6:50:56 PM , Rating: 2
Plus, if it's the number cruncher (FP?) beast some seem to think it is, it'd be an awesome Linux F@H chip...


RE: Power6 and C2D Benchmark comparison
By InsaneScientist on 5/22/2007 10:05:39 PM , Rating: 2
F@H is mainly Floating Point Ops, yes, but number crunching generally is integer ops, which are handled by a completely different execution portion of the CPU (and usually a much narrower path.)

This page might be of some help in differentiating between the two... it's not the best, though... I can't find anything really good. :-S

http://softwarecommunity.intel.com/isn/Community/e...


By InsaneScientist on 5/24/2007 3:58:21 AM , Rating: 2
My brain was obviously on vacation this morning... I guess this is what happens when you finish your finals. :D

Anyhoo... it's the FPops specific portion of the CPU that's usually the narrower path... not the integer part like my last post implies (and I'm not entirely sure where I was going with that line of thought, either...)


RE: Power6 and C2D Benchmark comparison
By AMDfreak on 5/22/2007 11:12:00 AM , Rating: 5
Actually, they could compete in the server market. We currently have some beefy p570 boxes that we're seriously considering replacing with clustered Dells running Prescott Xeons and Linux. There's no question that Power5 and Power6 are computing monsters, but we can replace the whole solution for what it would cost us to upgrade the memory in a p570. Think I'm exaggerating? The quote we got to go to 32GB RAM would cost as much as a small house here.....


By Tsuwamono on 5/22/2007 7:04:27 PM , Rating: 2
alcatel has boards that run 40 CPUs on it... WITHOUT the chips in their BGA slots the board sells for $35 000 US roughly... Thats a board for industrial use. Believe me... your C2D or prescott rig costs are chump change


Speed over Cores
By crystal clear on 5/22/07, Rating: 0
RE: Speed over Cores
By Goty on 5/22/2007 12:10:28 PM , Rating: 2
Each individual CPU only has two cores and can only execute four threads, but you have to consider that these are meant to run in parallel with anywhere from 1-15 additional CPUs in the same system.


RE: Speed over Cores
By crystal clear on 5/23/2007 2:39:17 AM , Rating: 1
Hi !Goty, I made the above comment in a big hurry(minutes before boarding my flight).

I think the below will clear some issues-

quote:
The trend to multicore processors is clearly the future but "not all customer workloads are multithreaded yet," said McCredie, explaining why the company pushed speed over cores with Power6. The CPU supports two threads on each of its two cores.


http://eetimes.eu/semi/199700606;jsessionid=ALUL0Y...

also-

quote:
SAN JOSE, Calif. — IBM Corp. will go back to the future with its next-generation Power6 design by pushing raw speed rather than trying to pack more cores on a die.


The CPU will run at speeds between 4-5GHz with a total of 8Mbytes L2 cache and a 75Gbyte/second link to external memory.

The Power6 doubles the frequency and bandwidth of the existing Power5 without increasing its power consumption or the depth of its execution pipeline. The move lets IBM ship the chip as a mid-2007 refresh for its existing p-series server line.

"We needed to scale the whole system. When you just pack on more cores and don't scale the cache and memory bandwidth you can't really scale CPU performance as well," said Brad McCredie, a fellow in IBM's Systems and Technology Group.

The Power6 will essentially follow the pattern set by IBM with the Power4 and 5 CPUs. The Power4 was among the first computer CPUs to put two cores on a single die. The company packed two dice on a single module for high-end versions of the chip. Intel Corp. likewise plans to use multi-chip modules to pack two dual-core dice on a family of quad-core chip modules it will start introducing in November.

IBM may surpass Intel in the speed race, although it has not determined exact speeds for shipping parts yet. Intel currently ships versions of its single-core Pentium running at up to 3.8GHz, but it slows its dual-core CPUs down to 2.93GHz or less to keep power and heat in check.

Thus the big news for IBM is how it can double frequency while holding the line on power consumption and pipeline depth. New circuit designs and process technology improvements plow the way for the advances. The chip uses "new and highly complex latch and static gate circuits," said McCredie.

The processor is built in a 65-nm process using IBM's silicon-on-insulator (SOI) and strained silicon technology. IBM applied new techniques in variable gate lengths and variable threshold voltages to squeeze maximum performance per Watt at the transistor level. The chip can be fully operated at as little as 0.8V.

"That enables us to take this chip into many low-end environments" said McCredie.

In addition, IBM will link its Power CPU for the first time to an external embedded controller. The controller will monitor and adjust power and performance parameters on the CPU based on set power management policies.

IBM is now in a systems test and debug phase using the Power6 in high-end, midrange and cluster computers for its p-series servers.

(10/10/2006 12:01 AM EDT)

http://www.eetimes.com/showArticle.jhtml;?articleI...


RE: Speed over Cores
By psychobriggsy on 5/23/2007 9:06:34 AM , Rating: 3
"Intel currently ships versions of its single-core Pentium running at up to 3.8GHz, but it slows its dual-core CPUs down to 2.93GHz or less to keep power and heat in check."

Well done eetimes for failing to differentiate between Intel's speed-racer but low IPC Pentium 4 design, and their high-IPC but slower Core 2 architecture. Indeed Intel did sell dual-core Pentium Ds at 3.4GHz at least, they failed to mention that.

It seems from the benchmark results that POWER6 retains a large amount of IPC whilst also being extremely fast. It's by far the speediest CPU available on the market (and I'm sure POWER6 will be made in a >5GHz version in due course as well). Single thread IPC is lower than Core 2 it appears from the SPEC results (4.7GHz POWER6 ˜ 3.3GHz Core 2), but run two threads on the core and it will compare very nicely to a Core 2 core overall.


RE: Speed over Cores
By psychobriggsy on 5/23/2007 9:18:19 AM , Rating: 3
Actually from rythie's post, a 4.7 GHz POWER6 running a single thread seems to be ~= 3.5 GHz Core 2 in integer, and ~= 4.2 GHz Core 2 in floating point. That's pretty good IPC.

Add in the second thread on that core and the per-core IPC might even be higher than Core 2 in floating point, although this is purely speculative and assuming that there is an overall improvement of 20% when running the second thread (no way do I think IBM's SMP implementation is going to suck like HyperThreading).


RE: Speed over Cores
By IntelUser2000 on 5/24/2007 10:50:37 PM , Rating: 2
quote:
Actually from rythie's post, a 4.7 GHz POWER6 running a single thread seems to be ~= 3.5 GHz Core 2 in integer, and ~= 4.2 GHz Core 2 in floating point. That's pretty good IPC.


Power 6 performs like a Core Duo(Yonah) in integer except it runs at 4.7GHz. It is really impressive. I estimate Intel needs a 3.7GHz Core 2 in SpecInt2006 and 5.2GHz Core 2 in SpecFP2006 to equal the 4.7GHz Power 6. Power 6 has better performance per clock in FP than Core 2. Sure, Power 6 does have massive bandwidth advantage so if we assume the same happens for Core 2, Core 2 should gain performance advantage per clock in SpecFP2006.

quote:
Add in the second thread on that core and the per-core IPC might even be higher than Core 2 in floating point, although this is purely speculative and assuming that there is an overall improvement of 20% when running the second thread (no way do I think IBM's SMP implementation is going to suck like HyperThreading).


Actually, Intel's HT isn't bad for the amount of resources and time it took for Intel. IBM's SMT in Power 5 and Power 6 is vastly more complex. Intel's HT is probably more efficient at same die size space than the IBM's version.


RE: Speed over Cores
By crystal clear on 5/23/2007 8:32:46 AM , Rating: 1
quote:
"The transition to multicore is happening even faster than the move to 64-bit, and Windows Server 2008 is multicore-ready. Microsoft is also licensing by socket, not cores, on a chip," said Bill Laing, general manager for Microsoft's Windows Server division, told attendees here at WinHEC on May 16.



Point to note-
"Microsoft is also licensing by socket, not cores, on a chip,"


Power6 wasn't built for desktops
By UNCjigga on 5/22/2007 2:57:49 PM , Rating: 2
I don't believe this technology was ever meant to power a desktop. Since Apple announced the switch to Intel, IBM new this architecture would be for professional server/workstation use only so they designed/packaged the new chip accordingly. Also, it may crush C2E/C2Q in performance-per-watt, but might be more evenly matched in performance-per-dollar.




RE: Power6 wasn't built for desktops
By minasbeede on 5/22/2007 5:02:47 PM , Rating: 2
I don't know. We (former employer - I'm retired) purchased two of the original Power processor systems. One was deskside, one was desktop. At that time we got the most bang for the buck from the Power systems (we also bought some MIPS-based DEC systems and a Stardent.) The bang for the buck was enhanced because we got the developer price on the IBM systems: 50% discount.

Apple used the Power architecture for years, when IBM and Motorola both produced power chips.

There surely are a number of ways IBM could use the Power 6 processor for a desktop machine, if they chose to do so. If the software vendors for computationally-intense desktop applications were to supply Power-processor versions of their products then the IBM desktop would be the system of choice for those applications, or at least that would seem to be so. Why wait 2 minutes for something to be done if you can get it in 1 minute on a faster system? There's a reason to seek faster processors and that reason isn't solely confined to the server market.


RE: Power6 wasn't built for desktops
By Zandros on 5/22/2007 5:56:39 PM , Rating: 2
While PowerPC is POWER-derived, they are not the same thing, and Apple has never used a Power µ-arch processor in their computers.


By Hoser McMoose on 5/22/2007 9:19:47 PM , Rating: 3
quote:
While PowerPC is POWER-derived

While that was true in 1993, it's not exactly meaningful anymore. The PowerPC instruction set was derived from the original POWER instruction set. However POWER-branded chips haven't used the POWER instruction set for a decade. IBM killed off that ISA in favor of PowerPC from their POWER3 processor onwards.

Like the POWER3, 4 and 5, this new POWER6 is actually a PowerPC chip and NOT a POWER chip. Yup, thanks IBM for making things clear as mud!

While Apple never used any POWER-branded PowerPC chips from IBM, the chips are (mostly) instruction-set compatible. The reason they didn't use one wasn't because they weren't the same, but rather because the POWER line of chips is IBM's REALLY expensive PowerPC chips. Since Apple didn't want to sell $10,000 desktops they instead stuck to IBM's cheaper offerings like the PPC 970 (aka G5).


wha?
By ixelion on 5/22/2007 10:08:05 AM , Rating: 2
quote:
IBM boasts that its processor has so much bandwidth that the POWER6 chip could download the entire iTunes catalog in about 60 seconds.


Anyone else think this seems a little silly, this assumes there would be nothing bottlenecking the performance, i.e. hard disk write speed.