Print 10 comment(s) - last by yanman.. on Mar 23 at 8:02 PM

InfiniPath scales incredibly well with CPU core count

Utilizing AMD's HyperTransport, InfiniPath outferforms other systems

PathScale's new PCI Express InfiniPath adapter
High-speed clustering on 10Gbit InfiniBand proves superior on AMD platforms

During IDF, we had the pleasant experience of being able to speak with PathScale, a company with products that are shaping the landscape of high-speed networking and communications for clustering environments. These applications include biotech research, space research, large-scale 3D rendering and any other high-level computational applications.

Greg Lindahl, PathScale's Distinguished Engineer, spoke to us about some of the company's controllers, specifically its InfiniPath controller, which is a 10Gbit/sec InfiniBand interconnect capable of scaling its performance depending on the number of CPU cores in a system. PathScale says that while other controllers are limited to 2 to 4 million messages (small packets of data), a single PathScale controller is capable of transmitting up to 10 million messages a second, providing the industry’s best InfiniBand performance.

Quick specifications:
  • 1.29 Microseconds MPI Latency (0-byte, 1/2 round trip)
  • 88 byte message size to achieve 1/2 of peak bandwidth (streaming, 8 CPU cores)
  • 954 MB/s peak uni-directional bandwidth (streaming)
  • 1884 MB/s peak bi-directional bandwidth (streaming)
  • 583 MB/s TCP/IP throughput with 6.7 microsecond latency with a standard Linux stack
  • 8.2 million messages/sec (four CPU cores per adapter)
  • 3.3 million messages/sec (one CPU core per adapter)
Currently, PathScale delivers its controllers in two formats: a PCI Express InfiniBand adapter and another that interfaces with AMD's HyperTransport. With PCI Express, PathScale says that customers will get better performance than the competition but performance will still be capped to the interface limitations of PCI Express. PathScale tells us that if enterprise board developers use its InfiniPath controller on a HyperTransport interface, performance will at least double that of the best PCI Express systems. Even better yet, multi-core Opterons coupled with HyperTransport will give the best performance available.

Lloyd Dickman, PathScale’s Distinguished Architect, tells us that not only does InfiniBand performance receive benefits, but high-speed clustering applications will realize the best performance utilizing an AMD platform. This claim comes as no surprise since Google, and several well established universities are already utilizing AMD platforms for their most demanding applications.

Lindahl left us with a few details about how PathScale's product scale well with more cores and while he said that the technique was conceptualized more than a decade ago, utilizing AMD's HyperTransport allows PathScale to fully capitalize on multi-core systems. Lindahl says that even on an 8-way Xeon system, a PCI Express InfiniPath adapter is unable to tap the full potential of all 8 cores simply because it must communicate through a single Northbridge chip. Lindahl said that while Intel has an answer to AMD's HyperTransport, it is unavailable and there is no set announcement date.

More about PathScale's InfiniPath products can be found here. The company also links to independent benchmarks sources which utilize systems that contain up to 512 physical CPUs (Opterons).

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

Whoever mentions
By Regs on 3/20/2006 4:49:49 PM , Rating: 2
The C word will be slanged.

RE: Whoever mentions
By 9748904947 on 3/20/2006 4:53:44 PM , Rating: 2
What? CSI?

RE: Whoever mentions
By killerroach on 3/21/2006 12:03:37 AM , Rating: 2
Who cares about Conroe? It's not able to scale with the number of CPUs we're talking about in Opteron setups. It may be a great desktop and workstation chip (although that still remains to be seen), but it will be no server powerhouse. After all, without a HT-like interface, Conroes can't scale well.

apples to oranges comparisons...
By josmala on 3/21/2006 6:17:20 AM , Rating: 2
Why the one itanium that scales well, is only 128 CPU score, when there are 512+ CPU implementations available. Perhaps the problem for them is that its the one with got improved CRAY interconnections instead of what others have, and its older generation is faster than the latest infiniband. [SGI is starting to ship its latest generation this month.]

2Ndly they are comparing different CPU:s on different interconnection fabrics, with different bus interfaces. And the graph doesn't even say what version of each interconnection is used on benchmarks.

Something says to me that they are comparing their latest generation to previous generation myrinet. Perhaps its the need of PCI-express on the new generation cards and its not used in (atleast most) itanium systems right now.
. PCI-X is what they use most of the time. [64bit 133 mhz pci bus. 8x faster than normal pci.]

Anyway THATS probably the limiting factor for itanium in these benchmarks.
And the vendors probably want to sell you 128->512 processor machines, that are NOT clusters. Which scales MUCH better than these, so it isn't really hurting itanium in these things since they have more scalable systems. They just are not included in this graph.

RE: apples to oranges comparisons...
By Topweasel on 3/22/2006 5:02:58 PM , Rating: 2
No sir your wrong, it apples to apples.

2 reasons

1. PCIe 4x (whats in the picture) is twice as fast (actually faster) then the PCI-X configuration you mention.

2. This was brought up as because of the connection issue.

HT provides between 2.4-6.4 GB/s which none of these can hit, and each board these connect two still has to deal with communication between CPUs on each board, cross talk and overall bus saturation (which includes the Itanium because like the P4 it uses the same basic FSB used way back with the original Pentium.

The limiting factor is bus saturationthat HT doesn't have because both on interboard and cross board communication HT is point to point.

RE: apples to oranges comparisons...
By yanman on 3/23/2006 8:02:47 PM , Rating: 2
I recall seeing some blade servers that had dual opteron's plus integrated infiniband HBA. I wonder if that particular one was connected over HT?

The new IBM opteron blades (LS20) look very nice - 2 opteron cpu's, 16GB RAM per blade, 14 blades per chasis!

Only if ...
By finalfan on 3/20/2006 8:13:18 PM , Rating: 2
True, only if the CPUs have to do the sending/receiving data. That is, if CPU is no longer needed for data transmitting then the output will be totally different.

RE: Only if ...
By Griswold on 3/21/2006 6:11:11 AM , Rating: 2
Brilliant idea, now invent something and become filthy rich!

PCI-E + HTT + Opteron?
By Kryptonite on 3/21/2006 1:46:29 AM , Rating: 2
wow cant wait to see wat will happend when AMD have PCI-E built into the Opterons....

Take that Intel.....
By aguilpa1 on 3/20/2006 4:17:05 PM , Rating: 1
and your crusty old Northbridge....

“Then they pop up and say ‘Hello, surprise! Give us your money or we will shut you down!' Screw them. Seriously, screw them. You can quote me on that.” -- Newegg Chief Legal Officer Lee Cheng referencing patent trolls

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki