backtop


Print 65 comment(s) - last by marvdmartian.. on Sep 30 at 9:12 AM


  (Source: HMCC)
FPGAs, ASICs, and other high performance commercial applications will be targeted by next year at volume

Today, vertical NAND (v-NAND) flash memory is in production and bumping storage densities in mobile devices to new highs.  Yet while many players -- including Samsung Electronics Comp., Ltd. (KSC:005930) -- are actively producing v-NAND, vertical DRAM -- stacked volatile memory -- remained unsampled until now.

I. HMC -- 70 Percent Less Power, 8x the Transfer Rate of DDR4

Micron Technologies Inc. (MU) this week announced the industry's first stacked DRAM.  While Micron describes the stacked chips as a "hybrid memory cube" (HMC) (also known as vertical DRAM (v-DRAM), the structure appears more like little stacks of poster board on a circuit board.

Micron HMC
Micron's hybrid memory cube is finally sampling. [Image Source: Micron]

Each layer features a 4 Gb (Gigabit) die, and there are four layers for a total capacity of 2 GB (Gigabytes) for the stack.  

Micron is claiming to get 160 GB/s (Gigabytes per second) of bandwidth for the chip.  That's an incredible data transfer rate, compared to the approximately 11 GB/s DDR3 gets and the 21-24 GB/s DDR4 is expected to get.  Moreover, the packaging cuts power consumption by 70 percent by reducing the distance signals have to travel between chips.

The Boise, Idaho-based chipmaker sees potential demand for HMC chips in "data packet processing, data packet buffering or storage, and computing applications such as processor accelerators."  The latter sounds like the new DRAM could be targeted at graphics processing units (GPUs), among the most memory-hungry components of a modern PC.

But HMC stock for consumer devices such as GPUs or smartphones won't be available for "three to five years", according to Micron.

II. Commercial Clients Get First Dibs on Smoking Fast HMC

In the meantime Micron plans to work its way up to volume production, selling the chips at higher prices to performance-sensitive commercial clients that are willing to pay more for the fresh technology.  To those ends the chipmaker's plan is straightforward; following its 2 GB sampling Micron plans to sample 4 GB stacks in early 2014, followed by volume production of the 2 GB and 4 GB stacks in late 2014 for enterprise clients.

Micron's DRAM Solutions VP Brian Shaw boasts, "System designers are looking for new memory system designs to support increased demand for bandwidth, density, and power efficiency.  HMC represents the new standard in memory performance; it's the breakthrough our customers have been waiting for."

In a press release distributed by Micron, Jim Handy, an analyst for Objective Analysis, says that it's natural for DRAM to evolve a similar 3D stacking approach to flash, despite inherent technical difficulties that have stymied top chipmakers like Samsung.  He comments, "The Hybrid Memory Cube is a smart fix that breaks with the industry's past approaches and opens up new possibilities.  Although DRAM internal bandwidth has been increasing exponentially, along with logic's thirst for data, current options offer limited processor-to-memory bandwidth and consume significant power. HMC is an exciting alternative."

hybrid memory cube

Samsung should eventually have a HMC of its own, as it's been working with Micron to develop some of the specifications and technologies underlying the process.  Both companies are part of the HMC Consortium (HMCC).  

Xilinx, Inc. (XLNX), the top designer of field programmable gate array (FPGA) chips, has also been collaborating on the project and has a keen interest in becoming an early adopter of the faster, more power-efficient v-DRAM.  FPGAs are key application for Micron's HMC chips to target next year, as they are very sensitive to the DRAM that's embedded inside the devices' many reprogrammable logic cells.

Micron is also winding up towards its launch of DDR4 memory, which may be made available at volume to commercial clients later this year.

Source: Micron



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

How about the latency?
By retrospooty on 9/26/13, Rating: 0
RE: How about the latency?
By Shig on 9/26/2013 3:33:44 PM , Rating: 2
They kept avoiding that statistic, so I'm going to guess that it's not as good. But it still beats other memory in three core categories.


RE: How about the latency?
By menting on 9/26/2013 3:41:37 PM , Rating: 1
latency measured from the balls of the package? or latency measured from the CPU?

Latency is also only part of the equation, as there's a huge difference between the latency of 1 (One) Act-Read/Write-Precharge, vs bank interleave latencies. So depending on application, you are talking about very different things.

So no, latency DOES NOT negate the performance increase of the bandwidth.


RE: How about the latency?
By retrospooty on 9/26/13, Rating: 0
RE: How about the latency?
By menting on 9/26/2013 3:51:39 PM , Rating: 2
No it DOESN'T.
It only matters IF all other specs stay the same, which would be the case if you're talking about the same generation of memory (DUH).


RE: How about the latency?
By retrospooty on 9/26/13, Rating: 0
RE: How about the latency?
By menting on 9/26/2013 4:00:43 PM , Rating: 2
tell me how latency can stay the same when tRA, tWR, etc specs are limited by physics, and clock frequency goes up?

Latency is NOT what negates a lot of the performance, the physics of the transistors and wire lengths are. That is why you try to hide it using bank interleave and other tricks.

Not going to bother trying to explain to you more. I have a HMC launch event to go to in a while.


RE: How about the latency?
By menting on 9/26/2013 4:05:52 PM , Rating: 2
and there was a typo..not tRA, but tRAS


RE: How about the latency?
By retrospooty on 9/26/2013 4:09:46 PM , Rating: 2
Let me try and re-phrase this. The way we have done this over the past decade going from DDR-400 to DDR3-1866+ is by dramatically increasing the latency to go along with the timings. In the end, the speed difference isnt all that much. The way our chipsets/CPU/Memory work is not efficient due to the required latency of the platform in general.

So if this is 8x faster than DDR4 (lets for the sake of argument DDR4-2000 at CAS10 becomes the common standard... If this comes out with the same substructure and is basically 8x the bandwidth, lets call it HCR-16000. If the latency goes up on the same scale we wont see much of an improvement just like DDR400 to DDR1866 isnt much of an improvement.

If this stiff comes out with a whole new memory structure that can actually utilize the bandwidth (this means new CPU and chipset designs) it could be very interesting.


RE: How about the latency?
By inighthawki on 9/26/2013 4:13:54 PM , Rating: 2
That's not how memory works. Timings are in clocks ticks, thus the scale in latency results in actually having (at worst) the same latency but significantly greater throughput. Increase clock frequencies can actually improve latencies in wall time as well, because the ratio of clock ticks to cycle time decreases.


RE: How about the latency?
By inighthawki on 9/26/2013 4:17:03 PM , Rating: 2
Sorry, I see from your other reply I misunderstood your post. Please ignore.


RE: How about the latency?
By menting on 9/26/2013 4:14:59 PM , Rating: 2
You need to define "improvement".
If you're talking about the time it takes for 1 and only 1 access, then you're not going to see much improvement whether it's DDRX (where X is any value, but most likely 5 is max), as you are limited by the transistors and the speed of electrons. Any improvement will come from process improvements.
if you're talking about the ability to store and to spit out data as fast as it can, it's a huge improvement.


RE: How about the latency?
By retrospooty on 9/26/2013 5:15:09 PM , Rating: 2
Now that we have some details from Micron, Micron would tend to agree there is a need.

"Reduced Latency – Enables significantly lower system latency as a result of HMC’s massive parallelism

Reduced Latency – With vastly more responders built into HMC, we expect lower queue delays and higher bank availability, which will provide a substantial system latency reduction."


http://www.dailytech.com/CommentUser.aspx?user=240...


RE: How about the latency?
By retrospooty on 9/26/2013 5:16:15 PM , Rating: 2
Derp ... Posted wrong link. Meant to go to amanojaku's comment with his link.

http://www.dailytech.com/Article.aspx?newsid=33446...


RE: How about the latency?
By menting on 9/26/2013 5:22:41 PM , Rating: 2
There's a need yes, but like I said, it's limited by transistors.

notice the "system latency"
it's not DRAM latency.


RE: How about the latency?
By Lord 666 on 9/26/2013 9:00:10 PM , Rating: 1
I'm thinking bitcoin mining via gpu's


RE: How about the latency?
By CaedenV on 9/26/2013 10:31:16 PM , Rating: 2
Reducing latency in the system is the single largest real world improvement we could see in the way of making computers faster for everyday tasks. Who cares if you can move things at 10MB/s or 10,000PB/s if most of your workload is sequential requiring very small packets of data to be processed and the latency between each step is going to take virtually the same amount of time.
Higher bandwidth chips are needed for editing 4K videos and other large data tasks, but it is not going to make your computer boot any faster, or open programs any quicker, or make games that much more responsive. All of that stuff gets improved by the response time of the latency on the ram (because just about everything is cached in ram before you need it these days).


RE: How about the latency?
By retrospooty on 9/26/2013 11:02:40 PM , Rating: 2
RE: How about the latency?
By inighthawki on 9/26/2013 4:07:44 PM , Rating: 5
I hope you realize CAS latency on memory is not in "real time", it is in clock cycles. I suggest you read the Wikipedia article. Modern memory with CAS latencies of 10 is much faster and lower latency that older memory with CAS latencies of 2.

http://en.wikipedia.org/wiki/CAS_latency
quote:
Because modern DRAM modules' CAS latencies are specified in clock ticks instead of time, when comparing latencies at different clock speeds, latencies must be translated into actual times to make a fair comparison; a higher numerical CAS latency may still be a shorter real-time latency if the clock is faster


Please also refer to the chart directly below it showing bandwidth speeds and corresponding real times.

One particular example, first word lookups:
DDR3-1600, CL9 --> 11.25 ns
DDR3-2000, CL10 --> 10 ns


RE: How about the latency?
By retrospooty on 9/26/2013 4:12:30 PM , Rating: 2
Yes, I get it, of course the newer high speed high latency is faster its just that the latency negates alot of the bandwidth gains. Todays DDR31866 isnt much faster than the old DDR400 CAS2 we used to use. Its faster, but not linearly faster. It is not 4+ x the speed... Not even close to that.


RE: How about the latency?
By inighthawki on 9/26/2013 4:15:39 PM , Rating: 2
That depends far more on what the application does. Large sequential memory accesses will be significantly faster because the latency is hidden behind the constant throughput, while smaller random accesses will only improve slightly.


RE: How about the latency?
By menting on 9/26/2013 4:27:59 PM , Rating: 2
how much faster it'll be will depend on what kind of memory access your application is doing. it can go from only like 10%-15% increase on the worst case, to the maximum 1866/400 at the best case.


RE: How about the latency?
By bug77 on 9/26/2013 5:27:14 PM , Rating: 2
Ha, you fell for it!
A latency of 2 cycles at 400MHz translates to a delay of 5ns. The same 5ns at 2000MHz is 10 cycles. Lo and behold, you can buy DDR3@2000MHz with CL9 or 10. So yes, the number of cycles has increased, but the actual time between the request and the response has remained exactly the same. They could have probably gone down if the voltage hadn't dropped from 2.6V to 1.5-1.65V (but that in turn would have generated too much heat).


RE: How about the latency?
By retrospooty on 9/26/13, Rating: 0
RE: How about the latency?
By menting on 9/26/2013 8:59:34 PM , Rating: 4
added latency?
there is 0 (ZERO, ZIP, NIL) added latency
5ns is STILL 5ns

and "speed" depends on how you define it. DRAM clock speed? Access time? Bandwidth?
It's like a single core processor that can do 10GHz with a 1 stage pipeline, vs a quad core processor that can do 5G with a 2 stage pipeline. Which one is "faster"? Well, that depends on how you define it.

God, you are hopeless. It'll be an embarrassment if you work in this industry.


RE: How about the latency?
By retrospooty on 9/26/13, Rating: -1
RE: How about the latency?
By menting on 9/26/2013 9:19:36 PM , Rating: 2
No, I don't know what you meant, because you weren't clear at all.
Nobody said latency isn't a problem, but it's not negating the gains in memory. You can't get across device limitations, so you work around it.
And I think it's pretty clear that it was talking about system latency, not DRAM latency.


RE: How about the latency?
By retrospooty on 9/26/2013 9:32:41 PM , Rating: 2
Seriously? Must you be so hard headed about everything?

It's like we are in a room, and you are on the couch sitting next to a football and I say throw me the ball , and you refuse to because I didn't specify "football" and you say didn't know what I meant.

I simply was stating that I hope the increase didn't add too much latency that would somewhat negate the real world gains. The human answer would have been, "No, I work with HMC and have some info. Here is how its being addressed"


RE: How about the latency?
By menting on 9/26/2013 9:41:22 PM , Rating: 2
I apologize.
But it's more like I had a football and a soccer ball beside me, and you asked me to throw you the ball, and I asked, which ball?

As for adding latency for HMC...
There's places where you add the DRAM latency, and places where you reduce the DRAM latency, due to the 3D stacking, design of the DRAM, the addition of a logic chip, etc. You can't really use the same conventional method of measuring DRAM latencies anymore, it's not an apples to apples comparison. But the end result is that the system latency is greatly reduced, and that's pretty much what the target market cares about.


RE: How about the latency?
By retrospooty on 9/26/2013 9:56:20 PM , Rating: 1
"I apologize. But it's more like I had a football and a soccer ball beside me, and you asked me to throw you the ball, and I asked, which ball?"

No worries, I was crappy too. Ok, the 2 ball analogy works, but only if I am standing there in a football jersey and a football helmet. ;)

Thanks for the clarifications though.


RE: How about the latency?
By menting on 9/26/2013 10:02:00 PM , Rating: 2
no hard feelings I hope? :)


RE: How about the latency?
By retrospooty on 9/26/2013 10:06:51 PM , Rating: 2
absolutely not. Just a heated discussion. And I mean it thank you for the clarification. Looks like some great stuff coming soon.


RE: How about the latency?
By Kiffberet on 9/27/2013 10:01:13 AM , Rating: 2
I think I'm gonna to cry.


RE: How about the latency?
By retrospooty on 9/27/2013 12:41:39 PM , Rating: 2
It's emotional stuff, these memories they are. :P


RE: How about the latency?
By kwrzesien on 9/27/2013 1:03:28 PM , Rating: 2
OMG, I've never seen a comment-war turn out with all smiles before...are we evolving?


RE: How about the latency?
By retrospooty on 9/27/2013 2:11:38 PM , Rating: 2
If we can evolve from Apes, we can get past a difference in semantics. Next step, we pull congress heads out of thier arses.

We may need a bigger forum.


RE: How about the latency?
By talonvor on 9/27/2013 5:51:00 PM , Rating: 2
Yeah I don't have much hope for that, if congress was even the slightest bit worth what we pay them, then we wouldn't be sitting on $17 trillion in debt while looking to make it larger.


RE: How about the latency?
By BRB29 on 9/27/2013 2:18:11 PM , Rating: 1
hope for human kind restored


RE: How about the latency?
By inighthawki on 9/26/2013 9:14:56 PM , Rating: 2
Either you don't understand the concept or you're speaking in a way that is confusing others. Just to clarify, having a higher CAS latency doesn't actually mean there is a higher actually latency.

At best you can say "It's barely faster for random access" because the latency stays the same and you are not harnessing the extra throughput from increased bandwidth.


RE: How about the latency?
By menting on 9/26/2013 9:16:56 PM , Rating: 2
I agree with inighthawki
If you say it's barely faster for random access, then you will be correct, and nobody will need to come out and correct you.


RE: How about the latency?
By retrospooty on 9/27/2013 10:43:35 AM , Rating: 2
Here is what I am talking about... Anandtech actually just did a pretty thorough analysis...

http://www.anandtech.com/show/7364/memory-scaling-...

In alot of these tests, DDR1600-C7 equals and sometimes even beats DDR3000-C12. It doesnt go back beyond DDR1333, but These scores, for memory alone, really aren't much better than the old DDR400-C2 sticks. It's hard to tell with that, because CPU's are so much faster now, as is everything else. But the tests from DDR1600-C7 to DDR3000-C12 shows the point quite well.


RE: How about the latency?
By bug77 on 9/27/2013 11:13:15 AM , Rating: 2
There are additional things to consider. Latency is hidden pretty well by modern chipsets (A64 with their embedded memory controllers were very sensitive to latency; today, the large amounts of level 3 cache negate much of that need). Extra bandwidth is simply not needed in many scenarios.
The upside? I could just pick any DDR3-1600 and never looked back. I don't even remember what I have, I think they're Patriot sticks.


RE: How about the latency?
By retrospooty on 9/27/2013 11:29:15 AM , Rating: 2
Those benchmarks from Anandtech illustrate exactly what the issue is regarding latency and why it can use architectural improvements that are not simply increasing bandwidth + adding latency a the same time.


RE: How about the latency?
By inighthawki on 9/27/2013 11:22:08 AM , Rating: 2
For random access related workloads, this shows exactly what has been said, not much difference. In most of the benchmarks, the difference is so tiny I'm betting you could attribute the variation to factors other than memory. It probably doesn't even have anything to so with the latency. CPUs are too complex to attribute 1-2% differences in performance as statistically significant.

As for the sequential benchmarks, it shows clear improvements based bandwidth just as expected.


RE: How about the latency?
By retrospooty on 9/27/2013 11:27:18 AM , Rating: 2
I didn't say it wasn't complex and there weren't different permutations... I was just saying I hope they don't just increase bandwidth without improving the current latency conundrum we have seen for the past decade. Little of what most people do on normal PC's involves high memory bandwidth.

I am talking about the "half empty" part of the glass. The 1/2 full part isn't in question, especially when even higher bandwidth stuff comes out. Why does everyone need to argue such simple points?


RE: How about the latency?
By inighthawki on 9/27/2013 12:06:07 PM , Rating: 2
Sorry I didn't mean to come off that way. Not trying to argue, I was agreeing with your post mostly. Was just pointing out for others that the benchmarks implied that random access had no benefit, and that the differences between scores were statistically insignificant.


RE: How about the latency?
By bug77 on 9/27/2013 3:19:46 AM , Rating: 2
Then again, not everything is small bursts, random access either.


RE: How about the latency?
By HoosierEngineer5 on 9/26/2013 3:50:04 PM , Rating: 2
Probably from row/column strobe to data-out. Latency can cause the CPU to hang (especially for sequential/non threaded instruction sequences), and will slow down execution. How much depends on the application being executed and how well the memory management unit can compensate for latency.


RE: How about the latency?
By FaaR on 9/26/2013 6:39:52 PM , Rating: 2
quote:
Latency can cause the CPU to hang (especially for sequential/non threaded instruction sequences), and will slow down execution.

It doesn't "hang", but whatever. Anyhow, memory latency is why CPUs have had speculative execution, pre-fetching, hyperthreading and similar measures for many years now. It works pretty well actually.


RE: How about the latency?
By Monkey's Uncle on 9/26/2013 4:10:20 PM , Rating: 2
Bandwidth is measured between the memory and the CPU (how much data can be pushed across the memory bus in a limited time frame). Latency is measured at the memory package itself (How long does it take for a single data request take from the time it is received by the memory package until the data word ready in memory package's bus read buffer).

All the bandwidth in the world is not going to help if the CPU has to wait a lot of cycles for a data request to be presented on the memory bus. Bandwidth is great for streaming sequential data, but it is defeated when you have long latencies holding up random data requests.

The flip side of this is those cycles are at the memory bus clock speed rather than the CPU clock speed. If the memory has a really high clock speed - preferably in multiples of the CPU clock speed - the memory subsystem can sustain higher latencies without as much losing random access performance.

I for one would love to see how this memory performs - both in for sequential and random data access.


RE: How about the latency?
By menting on 9/26/2013 4:35:41 PM , Rating: 2
HMC can currently run at 15G clk.


RE: How about the latency?
By amanojaku on 9/26/2013 5:02:28 PM , Rating: 2
quote:
latency measured from the balls of the package?
Balls? Package? How did you miss this, Jason?

In all seriousness, Micron's marketing claims lower latency. Lower than what, it doesn't say. It does say most of the technical details are under NDA.
quote:
Reduced Latency – Enables significantly lower system latency as a result of HMC’s massive parallelism

Reduced Latency – With vastly more responders built into HMC, we expect lower queue delays and higher bank availability, which will provide a substantial system latency reduction.
Here is some publicly available information on HMC.

HMC Specification 1.0
http://hybridmemorycube.org/files/SiteDownloads/HM...


RE: How about the latency?
By retrospooty on 9/26/2013 5:12:09 PM , Rating: 2
"Reduced Latency – Enables significantly lower system latency as a result of HMC’s massive parallelism

Reduced Latency – With vastly more responders built into HMC, we expect lower queue delays and higher bank availability, which will provide a substantial system latency reduction."


Sweet...


RE: How about the latency?
By menting on 9/26/2013 5:26:55 PM , Rating: 2
like i mentioned in another message
notice "system latency" and "as a result of HMC's massive parallelism".
It's not talking about DRAM latency, but system latency. Which was why I asked at the very top whether you were talking about latency measured from the balls of the package, or measured from the CPU (which is pretty much considered the system latency)


RE: How about the latency?
By retrospooty on 9/26/2013 6:18:32 PM , Rating: 2
Stop being internet pedantic. All I am saying is if they dont address the latency issue, having 8x the bandwidth means very little.

It looks like they are so good.


RE: How about the latency?
By Spuke on 9/26/2013 7:27:12 PM , Rating: 2
Retro, you should know that an inaccurate explanation is worthless. Sometimes sh!t just takes a while to say in order to get the real meaning.


RE: How about the latency?
By menting on 9/26/2013 8:46:08 PM , Rating: 2
I work on HMC.

System latency and DRAM latency are very different things, even though they all contribute to the final delay.

If bandwidth means very little, and latency is king, why are most, if not all the supercomputer manufacturers so excited about this? I think they know more than you about how much impact HMC will have on computing.


RE: How about the latency?
By inighthawki on 9/26/2013 9:19:38 PM , Rating: 2
I think it depends on the target platform. For small applications that do mostly random access, latency is big. For the kinds of massive data crunching that a supercomputer does, high throughput is far more important.


RE: How about the latency?
By menting on 9/26/2013 9:24:30 PM , Rating: 2
of course it does. It always does. That's why you have different architecture for different uses.

that's also why I was very clear on saying it depends on application, and saying supercomputer manufacturers (without mentioning others) are interested in this


RE: How about the latency?
By purerice on 9/29/2013 12:29:45 AM , Rating: 2
please look back at this post. It looks like you missed the depth of it.
By inighthawki on 9/26/2013 4:07:44 PM

By the way, do you know the meaning of pedantic? You keep focusing on latency as the deal breaker of modern RAM and slamming that over and over and over again... Almost... pedantic, no?

In the AnandTech article you mention, higher bandwidth DRAM was generally faster in IGP systems (naturally) where RAM was the main bottleneck.
In non-IGP systems, there was little correlation between higher bandwidth RAM and better performance as other bottlenecks (CPU, GPU, I/O, etc) existed. However there was ZERO negative correlation.

It is ironic that you use a reference to support your claim that latency is a deal breaker when
1) you do not understand what latency means
and
2) the very article you reference proves you wrong.


Cool
By Ammohunt on 9/26/2013 2:51:51 PM , Rating: 2
Will be interesting to see how they handle placement on the ICB in laptops and at what densities. Good ideas coming from micron again refreshing.




RE: Cool
By marvdmartian on 9/30/2013 9:12:16 AM , Rating: 2
I'm guessing these are no thicker than, say, a hard drive or optical drive would be, so there shouldn't be any problem with the thickness.

Personally, I'm waiting for the Star Trek memory cubes to come about. Makes you wonder if the guys who thought up flash drives didn't get some of their ideas from watching that show?


typo?
By mik123 on 9/26/2013 8:25:03 PM , Rating: 2
quote:
DRAM that's embedded inside the devices' many reprogrammable logic cells.


Jason, I believe you meant DRAM arrays included in the FPGA chips, not the memory inside logic cells of FPGA chips (that would be SRAM).




:)
By wwwcd on 9/27/2013 7:56:19 AM , Rating: 2
Problem of HMC, not is in speed. Problem is nand....number of cicles of live--death of cells.




... and portable devices?
By purerice on 9/29/2013 2:34:19 AM , Rating: 2
decreasing power by 70% is great, even without a speed increase. I would be curious to see how this stacks up against LPDDR4 in power draw and performance.




"We shipped it on Saturday. Then on Sunday, we rested." -- Steve Jobs on the iPad launch














botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki