NVIDIA GeForce 800M Series Mixes it up, Bests Promised Results
March 12, 2014 10:20 PM
comment(s) - last by
Mix of Kepler and Maxwell chips outdo NVIDIA's performance claims in independent benchmarks
A new release has us asking the question -- is NVIDIA being boastful, accurate, or modest? Let's find out.
I. Mobile Firepower
While Advanced Micro Devices Inc. (
) may be
struggling to bring
into the mobile space
, NVIDIA Corp. (
) isn't content to just rest on its laurels with the
highly successful GeForce 700M series
. As if sensing the scent of a wounded animal, NVIDIA has just announced an aggressive update to its
mobile offerings -- the
NVIDIA GeForce 800M Series
With the launch, NVIDIA showed off benchmarks which indicate that its four gaming-geared laptop parts -- the GeForce GTX 850M, GTX 860M, GTX 870M, and GTX 880M -- would best its predecessors by 60, 40, 30, and 15 percent, respectively.
First-party benchmarks from hardware makers are kind of like politicians -- they're often wrong, they're useless without fact checking, and even when they're telling the truth, they're sometimes doing so in a misleading way.
But when it comes to the GeForce 800M series -- in some cases at least -- it appears that NVIDIA was being a bit modest.
Consisting of a mixture of
parts, the new mobile GPUs offer an across the board bump on processing, which in some cases can be much higher than NVIDIA claims. And we're not talking synthetic benchmarks for bragging rights, we're talking about squeezing frames to make titles that were unplayable playable.
End of the Lineup
The new GPUs are all produced on the 28 nm node by
Taiwan Semiconductor Manufacturing Comp., Ltd. (
NVIDIA is getting great yields on its relatively bulky 294 mm^2
dies on that process, so it's sticking with the generation a bit longer. Thus the high-end is composed of
parts, and thus can be viewed as a tuned up version of last summer's GeForce 700M
That's not necessarily a bad thing; the
chips deliver some monstrous performance (more on that in a minute), particularly in the flagship 880M, which a fully functional GK104 chip with all the streaming multiprocessors active.
The GTX 880M is a fully functional GK104 (
) chip. [Image Source: AnandTech]
Also note that NVIDIA is executing an identical strategy on the desktop front. For example
the GTX 770 is basically a tuned up GTX 680
. That comparison is important to the flagship of the GeForce 800M lineup -- the GTX 880M. The GTX 880M is basically a GTX 770 with new battery saving features and a roughly 10 percent lower clock (954 MHz vs 1046 MHz in the GTX 770).
A Kepler SM [Image Source: AnandTech]
The GTX 870M and 860M cut the wide 256-bit memory bus on the GTX 880M down to 192-bit and 128-bit respectively -- corresponding to successive 1 GB reductions to the GDDR5 bank (down to 3 GB and 2 GB, respectively). They also disable 1 (GTX 870M) or 2 (GTX 860M) of the eight processing blocks (streaming multiprocessors, or SMs for short). The GTX 860M also takes a pretty big hit in clock speed, dipping to 797 MHz.
Returning to the GTX 880M, there's a big picture message here -- laptop GPUs are creeping closer than ever before to parity with desktop GPUs in processing resources (e.g. shaders), memory, and clock speeds. In fact, NVIDIA is getting so close that it is simply recycling desktop chips with mobile features within a few months of their launch.
The GTX 880M is basically a mobile-ready version of the GTX 770 card, with 10 percent lower clock, twice the GDDR5 memory, and half the TDP (125 watts, versus 250 watts TDP for the GTX 770). You could say it's the pinnacle of
Mobile, desktop, and ultramobile all on the same architecture! (Sort of...)
[Image Source: NVIDIA]
Continuing the desktop to mobile strategy transition, recall that
the Tegra K1 smartphone chip
has an on-die single SM
graphics processor. So mid-range desktop and laptop now both have their 8 SM
chip, while the upcoming Tegra K1 has a single SM
That trend -- and a uniform release strategy -- continues on the low end of NVIDIA's new lineup which populated with
is an entirely new architecture -- NVIDIA's second generation architecture on the 28 nm node.
GPUs have been a boon to computing in recent years outside of just gaming, accelerating everything from browsing to the math behind blending layers in Adobe Systems Inc.'s (
Three critical problems to GPU computing's model of identical workers are:
Not enough material to work on.
This is a problem due to memory access. Global memory access is slow -- taking up to hundreds of cycles. Smart programming can reduce this, but GPU makers like NVIDIA have gotten aggressive by baking registers, caches and texture memory into their chips.
Workers don't know what to do do.
It's kind of hard to worker if you don't know what to do. It's the job of a scheduler to tell the workers (the GPU's tiny cores, also know as unified shaders) what to do, but with
the design of the streaming multiprocessing unit was relatively monolithic, so some workers were left idle when a task that only used a few workers at a time was called.
Workers don't have the right tools.
Whether its graphics or computer, shaders often a way to do fast, complicated (transcendental) math functions, such as taking the sine of an angle or the square root of a floating point number. In
6 workers had to share a special function (SFU) that does these kinds of maths processes. Likewise LD/ST (load/store) units were needed to grab sets of data from the global graphics memory or put data in it. Aside from the inherent latencies, this adds to the slowness ddescribed in #1.
is a mobile minded design and starts with the GM107. Given its still on the same process node as the GK107 (28 nm) the growth of the die size -- from 118 mm^2 to 148 mm^2 -- and transistors -- 1.27 billion to 1.87 billion -- it appears that we have some new goodies.
These goodies are found in the new SMMs, also known as
Streaming Multiprocessors, which replace
's monolithic SMXs.
A GM107 has 3 SMMs which share an L2 cache. [Image Source: AnandTech]
The first goodie is the addition of a much beefier L2 cache, a buffer between the global memory and the streaming multiprocessors (SMs), that's shared among all SMs. it has been expanded to 2 MB versus GK107's 256 KB. By growing the top level of the cache (registers and L1 cache stay the same) globally memory access can be drastically reduced (aka reads from the GDDR5 chips that are somewhere on your laptop or graphic card's printed circuit board).
And the tools per worker has been increased.
SMs have more tools per worker and takes a load off the schedulers.
[Image Source: AnandTech]
made 6 workers share a LD/ST and SFU.
cuts this to 4, so things that need a lot of special functions or memory accesses will see major gains. Another increase to the toolset comes in the ALUs. The
SMM had 1 ALU per core (192 total); the
SMX has 2 ALUs per core (256 total).
Another little reward improves thread scheduling. In
an SMX was composed of 192 CUDA cores in a single domain shared by 4 warp schedulers. In
the SMM is divided into 4 separate domains, each "owned" by a warp scheduler and with 32 CUDA cores (for a total of 128 CUDA cores per SMM).
This is an improvement for a couple of reasons. In
a warp scheduler had to try to juggle blocks using a mix of firmware and hardware -- a daunting task. If other schedulers were using resources, it would have to wait its turn. With the new
SMMs, the warp scheduler doesn't have as much to worry about as it owns all its resources and just has to figure out the optimal order to use them in.
The final goodie to be had is the unification of the 64 KB texture and L1 blocks into a single 128 KB block, which can be split into a cache and a shared block (in 32/96, 64/64, or 96/32 splits).
In short, a
core is smaller and includes a new level of granularity versus a
core. It has more tools for the threads. And it offers a major revamp to the fast memory (the shared L1 cache, the slightly slower L2 cache, etc.) offering more fast storage and more flexibility.
These improvements lead to what NVIDIA claims is roughly a 50 percent speedup over the GK107. But more importantly, they shave about a quarter of the power consumption off. That leads to NVIDIA's claim that the GM107 is 2x faster per watt.
's benchmarking showed this was indeed true. Ryan Smith and Ganesh T S
of the GM107 versus the
GK106 GeForce GTX 650 Ti
By doubling their performance-per-watt NVIDIA has significantly shifted their performance both with respect to their own product lineup and AMD’s lineup. The fact that the GTX 750 Ti is nearly 2x as fast as the GTX 650 is a significant victory for NVIDIA, and the fact that it’s nearly 3x faster than the GT 640 – officially NVIDIA’s fastest 600 series card without a PCIe power plug requirement – completely changes the sub-75W market. NVIDIA wants to leverage GM107 and the GTX 750 series to capture this market for HTPC use and OEM system upgrades alike, and they’re in a very good position to do so. Plus it goes without saying that compared to last-generation cards such as the GeForce GTX 550 Ti, NVIDIA has finally doubled their performance (and halved their power consumption!), for existing NVIDIA customers looking for a significant upgrade from older GF106/GF116 cards.
In other words, NVIDIA promised, NVIDIA delivered.
IV. The Rest of the Family, Including the
Returning to the new mobile chips, the 860M is basically the GTX 750 Ti in mobile form, launched a month after its desktop identical twin. Like most identical twins, there are some differences if you look close enough, but for most the two chips will be indistinguishable.
The GeForce GTX 750 Ti rocked the plugless market thanks to its 75 watts (idle). This comes doubly in handy for the GTX 860M, which often might be called upon to run off a battery.
The GTX 850M should resemble the GTX 860M scaled down in performance. It has the same spec but a lower clock speed (876 MHz vs. 1029 MHz for the GTX 860M).
NVIDIA does deserve criticism for one dangerous decision -- its decision to muddle the lineup with an identically named GTX 860M based on
. That's right,
two identically named
GeForce GTX 860M GPUs, one of which is
, one of which is
The logic on NVIDIA's end is apparent. It needed a 6 SM
mobile chip to dump parts with dead SMs. And with 1152 cores clocked at 797 MHz its possible that power and performance will stack up similarly to the
GTX 860M, which has roughly 55 percent of the cores (640), but with a new architecture and a 29.1 percent higher (1029 MHz) clock speed.
Unfortunately early OEM designs have scooped up the
variant (or so testers say) almost exclusively so there's no telling how similar "almost similar" is. If it turns out that it really is about the same in performance and power than we can shut off the alarm and all praise NVIDIA for its cleverness. If it's not, then you could say NVIDIA is deceiving consumers, something that might trigger a backlash against its chips.
There's also one other part of the family that also launched -- three budget chips (the GTX 840M, 830M, and 820M). The good news is that the GTX 840M and 820M are budget
parts, with cut down SMM count and likely a slower clock.
[Image Source: AnandTech]
While some will be turned off by their (likely) slow 64-bit bus, they make sense as they're a step up from an integrated GPU,
such as the Iris iGPU
found inside some of Intel Corp.'s (
The head scratcher is the GTX 820M, which is a two generations old
was first introduced with the GeForce 400 Series back in 2010). It only has 96 cores. That's a worse chip than most iGPUs, by today's standards. Maybe it will find a home in some budget laptops built with last generation Intel chips? Who knows, but it's a strange entry, and perhaps the only other potential point of criticism in the 7 (or 8 if you count each GTX 860M variant) chip family.
Wrapping up, the new chips also include support for NVIDIA's proprietary GameStream technology,
for use with the NVIDIA Shield portable
And they also include a clever little firmware+hardware tweak called "NVIDIA Battery Boost", which locks the framerate at a playable level (around 30 fps), but prevents unnecessarily high framerates when running on battery.
The tweak is found in both the
components of the lineup.
NVIDIA claims this feature can boost battery life two-fold, a gain that comes on top of the doubling of performance-per-watt in the
chips. Of course this only applies to games like Batman: Arkham Origins, which get reasonably good framerates to start. For games that are marginally playable like Crysis 3, don't expect any gains if you try to play them on battery.
V. Will It Blend ... err Live Up to the Benchmarks?
So the last question is simple. Does the performance live up to NVIDIA's claims?
We've already hinted the answer is yes.
Here are some benchmarks to prove it:
GTX 880M vs GTX 780M (@ 1920 x 1080 pixels)
From Hardware Heaven
Battlefield 4: +10.5% fps (42 vs. 38)
Batman Arkham Origins: +28.4% fps (95 vs. 74)
F1 2013: +43.3% fps (96 vs. 67)
DOTA 2: +8.2% fps (92 vs. 85)
Bioshock Infinite: +15.0% fps (81.67 vs. 71)
Metro: +64.3% fps (18.67 vs. 30.67)
From Laptop Magazine
(arguably the best benchmark review):
World of Warcraft: +8.5% fps (114 vs. 105)
Bioshock Infinite (low): +14.5% fps (124* vs. 142)
Bioshock Infinite (high): +33.3% fps (64 vs. 48)
Metro (low): -3.6% fps (81 vs. 84) **
Metro (high): +9% fps (24 vs. 22) **
*= middle result of several tested laptops with a GTX 780M
**= best case
GeForce GTX 880M laptops [Image Source: Laptop Magazine]
GTX 860M vs GTX 760M (@ 1920 x 1080 pixels)
Bioshock Infinite: +60% fps (52.83 vs. 33)
Metro: +37.5% fps (14.67 vs. 10.67)
Crysis 3: 19 fps (3% slower than GTX 770M, 32 percent faster than the GTX 760M)
Farcry 3: 23 fps (10% faster than GTX 770M)
As an aside, if you want an interesting, if at times chuckle-inducing, nontechnical/layman's review check out
. While it presented no benchmarks for its "test" it comments:
The GT60 2PE Dominator Pro is so powerful, sporting a quad core, hyper threaded Intel Core i7-4800MQ, that it can actually match many high-performance desktop CPUs in terms of performance. This means it can actually replace your desktop for the simple reason it’s likely to be just as powerful.
Gaming performance was something of a complicated area. Due to the crazy amount of pixels the graphics processor has to deal with, the GT60 2PE Dominator Pro struggled in demanding games such as Crysis 3 and Battlefield 4, and I had to drop the resolution to 1080p to get things smooth enough to be playable.
"Crazy amount of pixels", huh? Well, to
' credit, it does confirm something that others weren't bold enough to try -- playing Crysis 3 on a GTX 880M laptop with the native 2,880x1620 pixel resolution on the MSI GT70 Dominator.
Back to the benchmarks above we recall that NVIDIA promised us a 15 percent boost and we've seen it overdeliver in most cases, and miss in a handful of others. The discrepancy between
-- both of which were comparing the GTX 880M equipped ASUSTek G750Jz versus the GTX 780M Alienware 17 is rather bizarre, to say the least. Clearly someone screwed up, and we'ere guessing it's
documented their testing procedure more rigorously.
It's possible it was running with different settings, or that a different CPU choice was made, but I'm guessing
just had a data entry error. This hypothesis is supported by the fact that the Bioshock results are similar (likely ruling out different amount of DRAM or processor).
We shall see.
The overall picture is that NVIDIA is a positive one. Some titles (e.g. Farcry 3, Crysis 3, and Metro) should be playable -- or almost playable on a laptop, getting roughly 24-30 fps with a GTX 880M and 19-23 with a GTX 860M. That's pretty exciting.
The cynic expects benchmarks to likely consistently underdeliver versus the real world ones. Instead, NVIDIA overdelivers in some cases and underdelivers in others. Some of this can be attributed to varying hardware configurations (different processors and amounts of memory), but a lot of it is likely simply the drivers support on a game by game basis among triple-A titles.
Overall that makes the GeForce 800M Series a pretty solid showing from NVIDIA, particularly given its power savings over last generation. Looks like these mobile GPUs will be worth buying.
For those who aren't quite ready for that, look ahead at NVIDIA's roadmap, the next major development is expected to be the stacking of DRAM on the GPU chip itself -- forming a 3D chip.
That archictecture will be dubbed Volta and should arrive around 2016.
Before that happens one or possibly two refreshes to Maxwell should occur. The GeForce 800 Series should bring a die shrink to TSMC's new 20 nm process, plus whatever little tweaks NVIDIA has devised based on the real world performance of its first generation
This article is over a month old, voting and posting comments is disabled
3/13/2014 2:10:58 PM
You can all vote me down, but it doesn't make me any less correct. ;)
3/13/2014 2:23:51 PM
Maybe he should incorporate both, as in:
"That's a bunch of jive turkey talk, the numbers don't jibe."
3/13/2014 2:27:23 PM
Or maybe he could be correct. Nowadays the use of noun as a verb is common as in: "don't disrespect me". So you could shorten "jive turkey" to "jive" and use it as a verb.
“So far we have not seen a single Android device that does not infringe on our patents." -- Microsoft General Counsel Brad Smith
Report: NVIDIA, Intel Gang up on AMD, Dent Its GPU Market Share
February 20, 2014, 10:12 AM
CES 2014: NVIDIA Shows Off Fifth Generation Tegra K1 Chip with 192 Cores
January 6, 2014, 9:00 AM
"Massive" Jelly Bean Update Coming to NVIDIA's Shield, $100 GPU Bundle Rebate
October 28, 2013, 3:28 PM
MSI Unveils New GE40 Gaming Notebook
June 13, 2013, 10:30 AM
NVIDIA's GeForce GTX 680 Gets Tuned, Rebranded as $399 GTX 770
May 30, 2013, 4:03 PM
Retiree Sues Apple For $7,500 for Wiping Honeymoon Photos From His iPhone
November 30, 2015, 10:23 AM
iPhone 7 May Pack 3-4 GB Memory, More Storage; 4-Inch Comeback is Rumored
November 20, 2015, 10:12 PM
OnePlus One, OnePlus 2 Will Receive Android Marshmallow in Q1 2016
November 16, 2015, 9:58 AM
Lenovo Whoa: Motorola Droid MAXX 2 and Turbo 2 Break Cover in Leaks
October 26, 2015, 3:12 PM
Leak: Apple Preps for First Real Android App Foray With New Apple Music App
October 24, 2015, 1:59 PM
Pepsi Smartphone? Empty Calories Coming Soon to the Midrange
October 12, 2015, 11:41 PM
Latest Blog Posts
Sceptre Airs 27", 120 Hz. 1080p Monitor/HDTV w/ 5 ms Response Time for $220
Dec 3, 2014, 10:32 PM
Costco Gives Employees Thanksgiving Off; Wal-Mart Leads "Black Thursday" Charge
Oct 29, 2014, 9:57 PM
"Bear Selfies" Fad Could Turn Deadly, Warn Nevada Wildlife Officials
Oct 28, 2014, 12:00 PM
The Surface Mini That Was Never Released Gets "Hands On" Treatment
Sep 26, 2014, 8:22 AM
ISIS Imposes Ban on Teaching Evolution in Iraq
Sep 17, 2014, 5:22 PM
More Blog Posts
Copyright 2016 DailyTech LLC. -
Terms, Conditions & Privacy Information