Print 68 comment(s) - last by eachus.. on May 6 at 8:33 PM

  (Source: LucasFilm)

Gordon Moore's prediction of doubling transistor counts every 2 years revolutionized the computer industry and his company, Intel.  (Source: New York Times)

An NVIDIA VP is declaring Moore's Law dead and GPUs the only hope for the industry.  (Source: TechCrunch)
In NVIDIA's eye the parallelism of the GPU is the only future for computing

NVIDIA has struggled this time around in the GPU war.  Its first DirectX 11 products were delivered a full seven months after AMD's.  While its new units are at last trickling onto the market and are very powerful, they're also hot, loud, and power hogs.  However, NVIDIA is staking much on the prediction that the computer industry will be ditching traditional architectures and moving towards parallel designs; a movement which it sees its CUDA GPU computing as an ideal solution for.

Intel and NVIDIA have long traded jabs, and Intel's recent failed GPU bid,
Larrabee, does little to warm to the ice.  In a recent op-ed entitled "Life After Moore's Law", published in Forbes, NVIDIA VP Bill Dally attacks the very foundation of Intel's business -- Moore's Law -- declaring it dead.

Moore's Law stemmed from a paper [PDF] published by Gordon Moore 45 years ago this month.  Moore, co-founder of Intel, predicted in the paper that the number of transistors per area on a circuit would double every 2 years (later revised to 18 months).  This prediction was later extend to predict that computing power would roughly double every 18 months, a prediction that became known as Moore's Law.

Now with die shrinks becoming more problematic, NVIDIA is convinced the end is nigh for Moore's Law (and Intel).  Writes Dally:

Moore's paper also contained another prediction that has received far less attention over the years. He projected that the amount of energy consumed by each unit of computing would decrease as the number of transistors increased. This enabled computing performance to scale up while the electrical power consumed remained constant. This power scaling, in addition to transistor scaling, is needed to scale CPU performance.
But in a development that's been largely overlooked, this power scaling has ended. And as a result, the CPU scaling predicted by Moore's Law is now dead. CPU performance no longer doubles every 18 months. And that poses a grave threat to the many industries that rely on the historic growth in computing performance.

Dally says that the only near-term hope for the computer industry now that Moore's Law is "over" is parallel computing -- splitting workloads up among a variety of processors.  However, he derides multi-core efforts by AMD and Intel, stating, "Building a parallel computer by connecting two to 12 conventional CPUs optimized for serial performance, an approach often called multi-core, will not work. This approach is analogous to trying to build an airplane by putting wings on a train. Conventional serial CPUs are simply too heavy (consume too much energy per instruction) to fly on parallel programs and to continue historic scaling of performance."

He concludes, "Let's enable the future of computing to fly--not rumble along on trains with wings."

In other words, he hopes you will buy NVIDIA GPUs and join the "Moore's Law is dead" party.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: Correct, but...
By Jaybus on 5/3/2010 2:39:39 PM , Rating: 2
Computer science has been working on extracting parallelism from sequential code for years and years without an answer. It isn't clear if it is even possible. However, that could be a good thing. Translated code is rarely better than native. Now that parallel hardware is common, engineers are becoming more and more comfortable "thinking in parallel". It only seems quite tricky when viewed with a sequential (and imperative) mindset. As multi-threading becomes the norm, it becomes less tricky, and functional programming languages like Haskell even begin to make sense.

RE: Correct, but...
By The0ne on 5/3/2010 3:02:07 PM , Rating: 2
Parallel programming from sequential code(s) has been going on over 15 years now. We were taught a semester on parallel programming but it just never took off so the class ended. The industry was not ready and did not want it at the time.

Having said that, parallel programming is not easy. It's definitely easier said than done. It's still new in the sense that many are not familiar with it, less begin to imagine how it is done. I really hope the interest in parallel programming and REAL multitasking will bring about new ideas and tools. We need them in this day and age; we seriously can't be stuck with today's tool IMO.

RE: Correct, but...
By The0ne on 5/3/2010 3:08:05 PM , Rating: 2
Ah, remember my professor's name. Professor Kavianpor. I'm pretty sure it's spelled different as I'm going off of the pronunciation :)

RE: Correct, but...
By Jaybus on 5/4/2010 2:24:37 PM , Rating: 3
An example in C:
for (i=2; i < n; i++) {
x[i] = x[i-1] * 2;
This cannot be auto-paralellized by any compiler or pre-processor that I'm aware of. It appears at first to be sequential by nature because x[i] is dependent on the previous iteration. However, the same problem can be solved with:
for (i=2; i < n; i++) {
x[i] = x[1] * (2 ** (i-1));
which can be auto-parallelized. The auto-parallelization is not smart enough to find a different solution to the same problem so that it may be parallelized. Thus, it is either up to the programmer to write sequential code that can be auto-parallelized, or it is up to the programmer to write parallel code to begin with. Is one more difficult than the other? I believe it depends on training and mindset.

We can't wait around on some clever computer scientist to solve this sort of AI problem, just as physicists can't wait around for the next Einstein to reconcile quantum mechanics and general relativity. In the meanwhile, it makes more sense to struggle through the learning curve and make the best use of what we have, rather than continuing to train students solely in sequential methods in the hope that someone will soon devise a really smart auto-parallelization tool.

RE: Correct, but...
By eachus on 5/6/2010 8:33:31 PM , Rating: 2
for (i=2; i < n; i++) {
x[i] = x[1] * (2 ** (i-1));

Actually that is worse than the original, even on a parallel machine, since you still have an implicit loop in the power of two. Using shifts, what you really want to do is evaluate:

for (i=2; i < n; i++) {
x[i] = x[1]<<(i-1);

Of course, you really should put an overflow check in there, since it will overflow pretty quickly if x[1] /= 0.

A better (and eminently practical) example is to try to implement the Simplex algorithm for solving linear programming problems efficiently. Even if you have a very good BLAS, such as the goto BLAS, or a version of the ATLAS BLAS tuned to your system, a good deal of optimization in Simplex code concerns itself with representing the basis as a (square) matrix times a series of vectors.* Every so often you need to "regenerate" the basis by inverting a matrix, and how often you need to do that depends on how stiff the current basis is, so you want to keep the determinant of the basis around from every pivot since the last regeneration. (Or some growing function thereof.)

You cannot, unfortunately, combine multiple changes to the basis or do them in parallel. You can, and must extract some of the parallelism from the problem for any serious sized LP problem, but the best way to do that may be a property of the actual LP problem being solved. For example, transportation problems are a special case of LP problems which allow for faster solutions, and oil companies typically have huge models including both transportation and refinery operations. In that case the current solution is used as a starting point for the next iteration.

In other words, the major job of the programmer is not to produce some correct implementation of the Simplex algorithm, but to produce a valid implementation which is well suited to the actual hardware and problem domain. Writing the final code is probably the easiest part of the job. Or you can get "canned" programs, and play with the optimization settings. It is often easier to write the code yourself, since you need to know it almost by heart to do a decent job on the tuning.

* If you are not familiar with linear programming, it is the problem of finding a maximum value of the objective function over a--usually large--set of variables, subject to a set of linear constraints. It was discovered in the 1940's that the optimal solution to a linear programming problem can will always have at most one non-zero variable for each constraint. Eliminating all the other (zero) variables, you now have a simple linear algebra problem to find the correct values for the "free" variables. The problem, of course, is first determining what that set is.

RE: Correct, but...
By gamerk2 on 5/3/2010 4:08:59 PM , Rating: 3
There are three main issues:

1: Threading
2: Multi-processing
3: OS limitations

Threading itself is simple; its only when combined with the next two factors that you get code that doesn't run well in a parallel environment.

Mutli-processing is much tougher, patley because of the underlying OS. In Windows, only one copy of a .DLL exists for every process that runs. [Every process inherits the same Windows.DLL, etc] As such, since every process inherits very low level system DLL's, you have code that at some point will no longer be able to run perfectly parrallel, regardless of how it is coded. [Hence, why I am a proponent of static linking and the return to static DLL's].

You also need to factor in the Windows Scheduler, which tends to keep most processes on one core, simply in order to share some of those repeated low level resources (which in theory, would cut down execution time if the individual tasks were independent of eachother).

Nevermind teh worst thing you can do is start putting lots of threads on lots of cores; taking the GPU as an example, a GPU would be slower in general computing tasks, as each singluar core is far slower then a standard CPU. Its only through parrallelization that GPU's are efficent. [Hence why they do rasterization, which is independent to each indvidual pixel].

Trying to parallelize serial code on an OS that was not designed for a multi-processor environment is a job that is doomed to failure. At some point, M$ is either going to have to re-work Windows with multi-processors in mind, or some other company will have to release a competiting OS.

RE: Correct, but...
By Bcnguy on 5/4/2010 1:01:34 PM , Rating: 2
On the third point. As far as I know MS is working on it at least since 2008 when started a collaborative project for Parallel Computing with the Barcelona Supercomputing Center (BSC-CNS).

"I modded down, down, down, and the flames went higher." -- Sven Olsen

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki