Mix of Kepler and Maxwell chips outdo NVIDIA's performance claims in independent benchmarks

A new release has us asking the question -- is NVIDIA being boastful, accurate, or modest?  Let's find out.

I. Mobile Firepower

While Advanced Micro Devices Inc. (AMD) may be struggling to bring Volcanic Islands into the mobile space, NVIDIA Corp. (NVDA) isn't content to just rest on its laurels with the highly successful GeForce 700M series.  As if sensing the scent of a wounded animal, NVIDIA has just announced an aggressive update to its mobile offerings -- the NVIDIA GeForce 800M Series.

With the launch, NVIDIA showed off benchmarks which indicate that its four gaming-geared laptop parts -- the GeForce GTX 850M, GTX 860M, GTX 870M, and GTX 880M -- would best its predecessors by 60, 40, 30, and 15 percent, respectively.

First-party benchmarks from hardware makers are kind of like politicians -- they're often wrong, they're useless without fact checking, and even when they're telling the truth, they're sometimes doing so in a misleading way.

But when it comes to the GeForce 800M series -- in some cases at least -- it appears that NVIDIA was being a bit modest.
GeForce GTX 880M
Consisting of a mixture of Kepler and Maxwell parts, the new mobile GPUs offer an across the board bump on processing, which in some cases can be much higher than NVIDIA claims.  And we're not talking synthetic benchmarks for bragging rights, we're talking about squeezing frames to make titles that were unplayable playable.

II. The Kepler End of the Lineup

The new GPUs are all produced on the 28 nm node by Taiwan Semiconductor Manufacturing Comp., Ltd. (TPE:2330) (TSMC).

NVIDIA is getting great yields on its relatively bulky 294 mm^2 Kepler dies on that process, so it's sticking with the generation a bit longer.  Thus the high-end is composed of Kepler parts, and thus can be viewed as a tuned up version of last summer's GeForce 700M

That's not necessarily a bad thing; the Kepler chips deliver some monstrous performance (more on that in a minute), particularly in the flagship 880M, which a fully functional GK104 chip with all the streaming multiprocessors active.

The GTX 880M is a fully functional GK104 (Kepler) chip. [Image Source: AnandTech]

Also note that NVIDIA is executing an identical strategy on the desktop front.  For example the GTX 770 is basically a tuned up GTX 680 Kepler chip.  That comparison is important to the flagship of the GeForce 800M lineup -- the GTX 880M.  The GTX 880M is basically a GTX 770 with new battery saving features and a roughly 10 percent lower clock (954 MHz vs 1046 MHz in the GTX 770).

A Kepler SM [Image Source: AnandTech]

The GTX 870M and 860M cut the wide 256-bit memory bus on the GTX 880M down to 192-bit and 128-bit respectively -- corresponding to successive 1 GB reductions to the GDDR5 bank (down to 3 GB and 2 GB, respectively).  They also disable 1 (GTX 870M) or 2 (GTX 860M) of the eight processing blocks (streaming multiprocessors, or SMs for short).  The GTX 860M also takes a pretty big hit in clock speed, dipping to 797 MHz.

Returning to the GTX 880M, there's a big picture message here -- laptop GPUs are creeping closer than ever before to parity with desktop GPUs in processing resources (e.g. shaders), memory, and clock speeds.  In fact, NVIDIA is getting so close that it is simply recycling desktop chips with mobile features within a few months of their launch.

The GTX 880M is basically a mobile-ready version of the GTX 770 card, with 10 percent lower clock, twice the GDDR5 memory, and half the TDP (125 watts, versus 250 watts TDP for the GTX 770).  You could say it's the pinnacle of Kepler designs.

Mobile, desktop, and ultramobile all on the same architecture! (Sort of...)
[Image Source: NVIDIA]

Continuing the desktop to mobile strategy transition, recall that the Tegra K1 smartphone chip has an on-die single SM Kepler graphics processor.  So mid-range desktop and laptop now both have their 8 SM Kepler chip, while the upcoming Tegra K1 has a single SM Kepler integrated GPU.

III. Why Maxwell Matters

That trend -- and a uniform release strategy -- continues on the low end of NVIDIA's new lineup which populated with Maxwell chips (GM107).  Maxwell is an entirely new architecture -- NVIDIA's second generation architecture on the 28 nm node.

GPUs have been a boon to computing in recent years outside of just gaming, accelerating everything from browsing to the math behind blending layers in Adobe Systems Inc.'s (ADBE) Photoshop.

Three critical problems to GPU computing's model of identical workers are:
  1. Not enough material to work on.

    This is a problem due to memory access.  Global memory access is slow -- taking up to hundreds of cycles.  Smart programming can reduce this, but GPU makers like NVIDIA have gotten aggressive by baking registers, caches and texture memory into their chips.
  2. Workers don't know what to do do.

    It's kind of hard to worker if you don't know what to do.  It's the job of a scheduler to tell the workers (the GPU's tiny cores, also know as unified shaders) what to do, but with Kepler the design of the streaming multiprocessing unit was relatively monolithic, so some workers were left idle when a task that only used a few workers at  a time was called.
  3. Workers don't have the right tools.

    Whether its graphics or computer, shaders often a way to do fast, complicated (transcendental) math functions, such as taking the sine of an angle or the square root of a floating point number.  In Kepler 6 workers had to share a special function (SFU) that does these kinds of maths processes.  Likewise LD/ST (load/store) units were needed to grab sets of data from the global graphics memory or put data in it.  Aside from the inherent latencies, this adds to the slowness ddescribed in #1.
Maxwell is a mobile minded design and starts with the GM107.  Given its still on the same process node as the GK107 (28 nm) the growth of the die size -- from 118 mm^2 to 148 mm^2 -- and transistors -- 1.27 billion to 1.87 billion -- it appears that we have some new goodies.  These goodies are found in the new SMMs, also known as Maxwell Streaming Multiprocessors, which replace Kepler's monolithic SMXs.

A GM107 has 3 SMMs which share an L2 cache. [Image Source: AnandTech]

The first goodie is the addition of a much beefier L2 cache, a buffer between the global memory and the streaming multiprocessors (SMs), that's shared among all SMs.  it has been expanded to 2 MB versus GK107's 256 KB.  By growing the top level of the cache (registers and L1 cache stay the same) globally memory access can be drastically reduced (aka reads from the GDDR5 chips that are somewhere on your laptop or graphic card's printed circuit board).

And the tools per worker has been increased. 

Maxwell SMs have more tools per worker and takes a load off the schedulers.
[Image Source: AnandTech]

Kepler made 6 workers share a LD/ST and SFU.  Maxwell cuts this to 4, so things that need a lot of special functions or memory accesses will see major gains.  Another increase to the toolset comes in the ALUs.  The Kepler SMM had 1 ALU per core (192 total); the Maxwell SMX has 2 ALUs per core (256 total).
Another little reward improves thread scheduling.  In Kepler an SMX was composed of 192 CUDA cores in a single domain shared by 4 warp schedulers.  In Maxwell the SMM is divided into 4 separate domains, each "owned" by a warp scheduler and with 32 CUDA cores (for a total of 128 CUDA cores per SMM).
This is an improvement for a couple of reasons.  In Kepler a warp scheduler had to try to juggle blocks using a mix of firmware and hardware -- a daunting task.  If other schedulers were using resources, it would have to wait its turn.  With the new Maxwell SMMs, the warp scheduler doesn't have as much to worry about as it owns all its resources and just has to figure out the optimal order to use them in.
The final goodie to be had is the unification of the 64 KB texture and L1 blocks into a single 128 KB block, which can be split into a cache and a shared block (in 32/96, 64/64, or 96/32 splits).
In short, a Maxwell core is smaller and includes a new level of granularity versus a Kepler core.  It has more tools for the threads.  And it offers a major revamp to the fast memory (the shared L1 cache, the slightly slower L2 cache, etc.) offering more fast storage and more flexibility.


These improvements lead to what NVIDIA claims is roughly a 50 percent speedup over the GK107.  But more importantly, they shave about a quarter of the power consumption off.  That leads to NVIDIA's claim that the GM107 is 2x faster per watt.

AnandTech's benchmarking showed this was indeed true.  Ryan Smith and Ganesh T S wrote of the GM107 versus the GK106 GeForce GTX 650 Ti:

By doubling their performance-per-watt NVIDIA has significantly shifted their performance both with respect to their own product lineup and AMD’s lineup. The fact that the GTX 750 Ti is nearly 2x as fast as the GTX 650 is a significant victory for NVIDIA, and the fact that it’s nearly 3x faster than the GT 640 – officially NVIDIA’s fastest 600 series card without a PCIe power plug requirement – completely changes the sub-75W market. NVIDIA wants to leverage GM107 and the GTX 750 series to capture this market for HTPC use and OEM system upgrades alike, and they’re in a very good position to do so. Plus it goes without saying that compared to last-generation cards such as the GeForce GTX 550 Ti, NVIDIA has finally doubled their performance (and halved their power consumption!), for existing NVIDIA customers looking for a significant upgrade from older GF106/GF116 cards.

In other words, NVIDIA promised, NVIDIA delivered.
IV. The Rest of the Family, Including the Fermi Black Sheep
Returning to the new mobile chips, the 860M is basically the GTX 750 Ti in mobile form, launched a month after its desktop identical twin.  Like most identical twins, there are some differences if you look close enough, but for most the two chips will be indistinguishable.
The GeForce GTX 750 Ti rocked the plugless market thanks to its 75 watts (idle).  This comes doubly in handy for the GTX 860M, which often might be called upon to run off a battery.

The GTX 850M should resemble the GTX 860M scaled down in performance.  It has the same spec but a lower clock speed (876 MHz vs. 1029 MHz for the GTX 860M).
GeForce 800M Series
NVIDIA does deserve criticism for one dangerous decision -- its decision to muddle the lineup with an identically named GTX 860M based on Kepler.  That's right, there are two identically named GeForce GTX 860M GPUs, one of which is Maxwell, one of which is Kepler.
The logic on NVIDIA's end is apparent.  It needed a 6 SM Kepler mobile chip to dump parts with dead SMs.  And with 1152 cores clocked at 797 MHz its possible that power and performance will stack up similarly to the Maxwell GTX 860M, which has roughly 55 percent of the cores (640), but with a new architecture and a 29.1 percent higher (1029 MHz) clock speed.
Unfortunately early OEM designs have scooped up the Maxwell variant (or so testers say) almost exclusively so there's no telling how similar "almost similar" is.  If it turns out that it really is about the same in performance and power than we can shut off the alarm and all praise NVIDIA for its cleverness.  If it's not, then you could say NVIDIA is deceiving consumers, something that might trigger a backlash against its chips.
There's also one other part of the family that also launched -- three budget chips (the GTX 840M, 830M, and 820M).  The good news is that the GTX 840M and 820M are budget Maxwell parts, with cut down SMM count and likely a slower clock. 

NVIDIA 800M low end
[Image Source: AnandTech]
While some will be turned off by their (likely) slow 64-bit bus, they make sense as they're a step up from an integrated GPU, such as the Iris iGPU found inside some of Intel Corp.'s (INTC) Haswell chips.


The head scratcher is the GTX 820M, which is a two generations old Fermi part (Fermi was first introduced with the GeForce 400 Series back in 2010).  It only has 96 cores.  That's a worse chip than most iGPUs, by today's standards.  Maybe it will find a home in some budget laptops built with last generation Intel chips?  Who knows, but it's a strange entry, and perhaps the only other potential point of criticism in the 7 (or 8 if you count each GTX 860M variant) chip family.
Wrapping up, the new chips also include support for NVIDIA's proprietary GameStream technology, for use with the NVIDIA Shield portable.

GameStream Shield
And they also include a clever little firmware+hardware tweak called "NVIDIA Battery Boost", which locks the framerate at a playable level (around 30 fps), but prevents unnecessarily high framerates when running on battery.
NVIDIA GeForce Battery Boost

The tweak is found in both the Kepler and the Maxwell components of the lineup.

Battery Boost chart
NVIDIA claims this feature can boost battery life two-fold, a gain that comes on top of the doubling of performance-per-watt in the Maxwell chips.  Of course this only applies to games like Batman: Arkham Origins, which get reasonably good framerates to start.  For games that are marginally playable like Crysis 3, don't expect any gains if you try to play them on battery.
V. Will It Blend ... err Live Up to the Benchmarks?
So the last question is simple.  Does the performance live up to NVIDIA's claims?
We've already hinted the answer is yes.
Here are some benchmarks to prove it:

GTX 880M vs GTX 780M (@ 1920 x 1080 pixels)

From Hardware Heaven:
Battlefield 4: +10.5% fps (42 vs. 38)
Batman Arkham Origins: +28.4% fps (95 vs. 74)
F1 2013: +43.3% fps (96 vs. 67)
DOTA 2: +8.2% fps (92 vs. 85)

From CNET:
Bioshock Infinite: +15.0% fps (81.67 vs. 71)
Metro: +64.3% fps (18.67 vs. 30.67)

From Laptop Magazine (arguably the best benchmark review):
World of Warcraft: +8.5% fps (114 vs. 105)
Bioshock Infinite (low): +14.5% fps (124* vs. 142)
Bioshock Infinite (high): +33.3% fps (64 vs. 48)
Metro (low): -3.6% fps (81 vs. 84) **
Metro (high): +9% fps (24 vs. 22) **

*= middle result of several tested laptops with a GTX 780M
**= best case

Geforce 880M laptops
GeForce GTX 880M laptops [Image Source: Laptop Magazine]

GTX 860M vs GTX 760M (@ 1920 x 1080 pixels)

From CNET:
Bioshock Infinite: +60% fps (52.83 vs. 33)
Metro: +37.5% fps (14.67 vs. 10.67)

Notebook Check:
Crysis 3: 19 fps (3% slower than GTX 770M, 32 percent faster than the GTX 760M)
Farcry 3: 23 fps (10% faster than GTX 770M)

As an aside, if you want an interesting, if at times chuckle-inducing, nontechnical/layman's review check out Forbes' writeup.  While it presented no benchmarks for its "test" it comments:

The GT60 2PE Dominator Pro is so powerful, sporting a quad core, hyper threaded Intel Core i7-4800MQ, that it can actually match many high-performance desktop CPUs in terms of performance. This means it can actually replace your desktop for the simple reason it’s likely to be just as powerful.
Gaming performance was something of a complicated area. Due to the crazy amount of pixels the graphics processor has to deal with, the GT60 2PE Dominator Pro struggled in demanding games such as Crysis 3 and Battlefield 4, and I had to drop the resolution to 1080p to get things smooth enough to be playable.

"Crazy amount of pixels", huh?  Well, to Forbes' credit, it does confirm something that others weren't bold enough to try -- playing Crysis 3 on a GTX 880M laptop with the native 2,880x1620 pixel resolution on the MSI GT70 Dominator.
Back to the benchmarks above we recall that NVIDIA promised us a 15 percent boost and we've seen it overdeliver in most cases, and miss in a handful of others.  The discrepancy between Laptop Magazine and CNET -- both of which were comparing the GTX 880M equipped ASUSTek G750Jz versus the GTX 780M Alienware 17 is rather bizarre, to say the least.  Clearly someone screwed up, and we'ere guessing it's CNET given that Laptop Magazine documented their testing procedure more rigorously.
It's possible it was running with different settings, or that a different CPU choice was made, but I'm guessing CNET just had a data entry error.  This hypothesis is supported by the fact that the Bioshock results are similar (likely ruling out different amount of DRAM or processor).
We shall see.
VI. Conclusions
The overall picture is that NVIDIA is a positive one.  Some titles (e.g. Farcry 3, Crysis 3, and Metro) should be playable -- or almost playable on a laptop, getting roughly 24-30 fps with a GTX 880M and 19-23 with a GTX 860M.  That's pretty exciting.
The cynic expects benchmarks to likely consistently underdeliver versus the real world ones.  Instead, NVIDIA overdelivers in some cases and underdelivers in others.  Some of this can be attributed to varying hardware configurations (different processors and amounts of memory), but a lot of it is likely simply the drivers support on a game by game basis among triple-A titles. 

NVIDIA GeForce 800M
Overall that makes the GeForce 800M Series a pretty solid showing from NVIDIA, particularly given its power savings over last generation.  Looks like these mobile GPUs will be worth buying.

For those who aren't quite ready for that, look ahead at NVIDIA's roadmap, the next major development is expected to be the stacking of DRAM on the GPU chip itself -- forming a 3D chip.


That archictecture will be dubbed Volta and should arrive around 2016.

Before that happens one or possibly two refreshes to Maxwell should occur.  The GeForce 800 Series should bring a die shrink to TSMC's new 20 nm process, plus whatever little tweaks NVIDIA has devised based on the real world performance of its first generation Maxwell models.

Sources: NVIDIA, Laptop Magazine, AnandTech [1], [2]

"If a man really wants to make a million dollars, the best way would be to start his own religion." -- Scientology founder L. Ron. Hubbard

Copyright 2017 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki