Print 68 comment(s) - last by eachus.. on May 6 at 8:33 PM

  (Source: LucasFilm)

Gordon Moore's prediction of doubling transistor counts every 2 years revolutionized the computer industry and his company, Intel.  (Source: New York Times)

An NVIDIA VP is declaring Moore's Law dead and GPUs the only hope for the industry.  (Source: TechCrunch)
In NVIDIA's eye the parallelism of the GPU is the only future for computing

NVIDIA has struggled this time around in the GPU war.  Its first DirectX 11 products were delivered a full seven months after AMD's.  While its new units are at last trickling onto the market and are very powerful, they're also hot, loud, and power hogs.  However, NVIDIA is staking much on the prediction that the computer industry will be ditching traditional architectures and moving towards parallel designs; a movement which it sees its CUDA GPU computing as an ideal solution for.

Intel and NVIDIA have long traded jabs, and Intel's recent failed GPU bid,
Larrabee, does little to warm to the ice.  In a recent op-ed entitled "Life After Moore's Law", published in Forbes, NVIDIA VP Bill Dally attacks the very foundation of Intel's business -- Moore's Law -- declaring it dead.

Moore's Law stemmed from a paper [PDF] published by Gordon Moore 45 years ago this month.  Moore, co-founder of Intel, predicted in the paper that the number of transistors per area on a circuit would double every 2 years (later revised to 18 months).  This prediction was later extend to predict that computing power would roughly double every 18 months, a prediction that became known as Moore's Law.

Now with die shrinks becoming more problematic, NVIDIA is convinced the end is nigh for Moore's Law (and Intel).  Writes Dally:

Moore's paper also contained another prediction that has received far less attention over the years. He projected that the amount of energy consumed by each unit of computing would decrease as the number of transistors increased. This enabled computing performance to scale up while the electrical power consumed remained constant. This power scaling, in addition to transistor scaling, is needed to scale CPU performance.
But in a development that's been largely overlooked, this power scaling has ended. And as a result, the CPU scaling predicted by Moore's Law is now dead. CPU performance no longer doubles every 18 months. And that poses a grave threat to the many industries that rely on the historic growth in computing performance.

Dally says that the only near-term hope for the computer industry now that Moore's Law is "over" is parallel computing -- splitting workloads up among a variety of processors.  However, he derides multi-core efforts by AMD and Intel, stating, "Building a parallel computer by connecting two to 12 conventional CPUs optimized for serial performance, an approach often called multi-core, will not work. This approach is analogous to trying to build an airplane by putting wings on a train. Conventional serial CPUs are simply too heavy (consume too much energy per instruction) to fly on parallel programs and to continue historic scaling of performance."

He concludes, "Let's enable the future of computing to fly--not rumble along on trains with wings."

In other words, he hopes you will buy NVIDIA GPUs and join the "Moore's Law is dead" party.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

RE: Correct, but...
By The0ne on 5/3/2010 3:02:07 PM , Rating: 2
Parallel programming from sequential code(s) has been going on over 15 years now. We were taught a semester on parallel programming but it just never took off so the class ended. The industry was not ready and did not want it at the time.

Having said that, parallel programming is not easy. It's definitely easier said than done. It's still new in the sense that many are not familiar with it, less begin to imagine how it is done. I really hope the interest in parallel programming and REAL multitasking will bring about new ideas and tools. We need them in this day and age; we seriously can't be stuck with today's tool IMO.

RE: Correct, but...
By The0ne on 5/3/2010 3:08:05 PM , Rating: 2
Ah, remember my professor's name. Professor Kavianpor. I'm pretty sure it's spelled different as I'm going off of the pronunciation :)

RE: Correct, but...
By Jaybus on 5/4/2010 2:24:37 PM , Rating: 3
An example in C:
for (i=2; i < n; i++) {
x[i] = x[i-1] * 2;
This cannot be auto-paralellized by any compiler or pre-processor that I'm aware of. It appears at first to be sequential by nature because x[i] is dependent on the previous iteration. However, the same problem can be solved with:
for (i=2; i < n; i++) {
x[i] = x[1] * (2 ** (i-1));
which can be auto-parallelized. The auto-parallelization is not smart enough to find a different solution to the same problem so that it may be parallelized. Thus, it is either up to the programmer to write sequential code that can be auto-parallelized, or it is up to the programmer to write parallel code to begin with. Is one more difficult than the other? I believe it depends on training and mindset.

We can't wait around on some clever computer scientist to solve this sort of AI problem, just as physicists can't wait around for the next Einstein to reconcile quantum mechanics and general relativity. In the meanwhile, it makes more sense to struggle through the learning curve and make the best use of what we have, rather than continuing to train students solely in sequential methods in the hope that someone will soon devise a really smart auto-parallelization tool.

RE: Correct, but...
By eachus on 5/6/2010 8:33:31 PM , Rating: 2
for (i=2; i < n; i++) {
x[i] = x[1] * (2 ** (i-1));

Actually that is worse than the original, even on a parallel machine, since you still have an implicit loop in the power of two. Using shifts, what you really want to do is evaluate:

for (i=2; i < n; i++) {
x[i] = x[1]<<(i-1);

Of course, you really should put an overflow check in there, since it will overflow pretty quickly if x[1] /= 0.

A better (and eminently practical) example is to try to implement the Simplex algorithm for solving linear programming problems efficiently. Even if you have a very good BLAS, such as the goto BLAS, or a version of the ATLAS BLAS tuned to your system, a good deal of optimization in Simplex code concerns itself with representing the basis as a (square) matrix times a series of vectors.* Every so often you need to "regenerate" the basis by inverting a matrix, and how often you need to do that depends on how stiff the current basis is, so you want to keep the determinant of the basis around from every pivot since the last regeneration. (Or some growing function thereof.)

You cannot, unfortunately, combine multiple changes to the basis or do them in parallel. You can, and must extract some of the parallelism from the problem for any serious sized LP problem, but the best way to do that may be a property of the actual LP problem being solved. For example, transportation problems are a special case of LP problems which allow for faster solutions, and oil companies typically have huge models including both transportation and refinery operations. In that case the current solution is used as a starting point for the next iteration.

In other words, the major job of the programmer is not to produce some correct implementation of the Simplex algorithm, but to produce a valid implementation which is well suited to the actual hardware and problem domain. Writing the final code is probably the easiest part of the job. Or you can get "canned" programs, and play with the optimization settings. It is often easier to write the code yourself, since you need to know it almost by heart to do a decent job on the tuning.

* If you are not familiar with linear programming, it is the problem of finding a maximum value of the objective function over a--usually large--set of variables, subject to a set of linear constraints. It was discovered in the 1940's that the optimal solution to a linear programming problem can will always have at most one non-zero variable for each constraint. Eliminating all the other (zero) variables, you now have a simple linear algebra problem to find the correct values for the "free" variables. The problem, of course, is first determining what that set is.

"Paying an extra $500 for a computer in this environment -- same piece of hardware -- paying $500 more to get a logo on it? I think that's a more challenging proposition for the average person than it used to be." -- Steve Ballmer

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki