Print 37 comment(s) - last by Spivonious.. on Jan 25 at 9:36 AM

Ashwood memory architecture allows for much faster memory speeds

Chipmakers realized long ago that extracting more performance from computer processors could be accomplished in ways other than simply reducing the size of the manufacturing process to squeeze more transistors onto a die.

One of the ways chipmakers improved performance was by building multi-core CPUs, like Intel's Penryn processors, that allow for parallel execution of data. Memory chips haven’t been able to keep up with the performance increases we are seeing in processors making for a bottleneck in the performance of computer systems and other devices.

In order to tackle this problem, a cryptographer named Joseph Ashwood has developed a new memory architecture that allows for multi-core memory.

Ashwood dubbed his memory architecture the Ashwood Architecture. According to EETimes the Ashwood architecture integrates smart controller circuitry next to the memory array on a single chip. This provides parallel access to the memory array for hundreds of concurrent processes leading to increased throughput and lower average access times.

Ashwood says, “My design borrows extensively from today's modern multicore CPUs. As far as concurrency goes, my memory architecture shares some features with Fibre Channel.”

Ashwood says his architecture can hit 16Gbytes per second compared to the DDR2 limit of 12 Gbytes per second. The hallmark of the Ashwood architecture is that the larger the number of bit cells in the memory the better the performance.

Ashwood does admit to a couple downsides to his design. The first is that his design is paper only, though it was independently verified by researchers from Carnegie Mellon University. No design was tested of the architecture at the electrical signal level.

The second drawback is that parallel access overhead of the architecture slows down access time to individual memory cells. However, Ashwood says that the parallel nature of his architecture more than makes up for any slowdowns by executing more commands at the same time.

Ashwood has filed a patent on his architecture that is still pending; until the patent is granted the intricate details of his architecture remain unknown.

Comments     Threshold

This article is over a month old, voting and posting comments is disabled

By masher2 on 1/17/2008 10:16:00 AM , Rating: 3
> "...his architecture can hit 16Gbytes per second compared to the DDR2 limit of 12 Gbytes per second"

DC DDR3 is up to what, 25 GB/s now? The bandwidth advantage here, I believe, is that Ashwood claims his architecture can hit those high rates even with relatively slow flash memory, due to the parallel access being employed.

RE: Bandwidth?
By phattyboombatty on 1/17/2008 10:24:45 AM , Rating: 5
I would assume the memory architecture would scale uniformly with the bandwidth of the underlying memory technology. Thus, the architecture implented with DDR3 memory would provide an increase from 25 GB/s to 33 GB/s (using the same percentage increase from the DDR2 memory).

The article also seems to allude that the architecture provides more benefits as the size of the memory increases, which will continue to occur in the future.

RE: Bandwidth?
By blowfish on 1/17/08, Rating: 0
RE: Bandwidth?
By MAIA on 1/17/2008 12:39:24 PM , Rating: 5
... and don't we always assume before knowing ?

RE: Bandwidth?
By xti on 1/17/2008 2:15:04 PM , Rating: 5
dont you know if you are wrong on the internet, you could die of syphilis?

RE: Bandwidth?
By Obujuwami on 1/17/2008 4:30:17 PM , Rating: 2
AND your children will be born NAKED!!!!

On a more serious note, one thing that a number of people either forget or ignore is that none of this matters if the hard drive bus does not increase with the rest of the technology. I would love to see some SAS type of speeds matched with this RAM. Would be rather nice.

RE: Bandwidth?
By Gentleman on 1/17/2008 6:54:38 PM , Rating: 2
Only if you are wrong about the person being a hot 18 year old and took that mistake home with you....

I wonder how is he able to avoid the issue of clock skewing problems inherent to parallel access lines.

RE: Bandwidth?
By jtemplin on 1/17/2008 1:34:18 PM , Rating: 3
Its called an hypothesis, or an educated guess if you will.

RE: Bandwidth?
By AntDX316 on 1/17/2008 4:30:28 PM , Rating: 2
so its like SATA where SATA can transmit 1 data at a time where Parallel does way more but SATA does it way faster than Parallel does overall

RE: Bandwidth?
By Spivonious on 1/17/2008 10:40:59 AM , Rating: 2
I would think that with a parallel design that memory access times would be significantly decreased. The bandwidth would be limited only be how the memory was connected to the processor.

RE: Bandwidth?
By SilentSin on 1/17/2008 11:03:26 AM , Rating: 2
I'm thinking that the logic chip that is on-board these sticks is the culprit for adding to access times. Similar to FB-DIMMs and buffered memory, adding logic only increases latency. However, this would be case dependent, a single read/write instruction might be carried out faster on lower latency normal RAM, but if you had multiple instructions to be performed then the parallel approach would of course be faster.

On a semi-related note, I've always wondered how SSDs are now being created. I had always assumed that they were using a similar type of parallel architecture to increase performance. Similar to RAID-0, only it's internal to the drive itself across multiple chips. Otherwise, I can't see how they increased throughput from what is usually quite pedestrian in flash memory to numbers that are now comparable to IDE and SATA real-world figures. Isn't this idea kind of old hat?

RE: Bandwidth?
By mahax on 1/17/2008 11:44:47 AM , Rating: 2
Yeah, this also resembles a crossbar type memory controller. So is it beneficial to segment the RAM on chip or by the memory controller? On chip does have the advantage that the RAM can be broken down to more smaller blocks. Crossbar has to do with the current hardware and is ultimately restricted by the width of the bus.

RE: Bandwidth?
By masher2 on 1/17/2008 11:05:42 AM , Rating: 2
Access times wouldn't be expected to decrease under a parallel scheme, but bandwidth would increase. It takes just as long for 10 people to dial 10 phone calls as it does one...but you get ten times the information flowing once the call is completed.

RE: Bandwidth?
By ninjit on 1/17/2008 11:16:02 AM , Rating: 2
Not quite right:

In terms of your analogy, those 10 people would have to make their calls sequentially over a single non-parallel line.

So the "apparent" average latency is higher, because each thread has to wait for the previous one to fetch it's request before it can proceed.

So let's say regular memory latency is 2 cycles, then 10 requests would take 20 cycles to complete.

Now let's say the parallel scheme can handle 10 requests simultaneously, but has increased latency to 10 cycles.
It only takes 10 cycles to fulfill those 10 requests.

Hence overall mean latency has actually decreased.

RE: Bandwidth?
By masher2 on 1/17/2008 11:27:45 AM , Rating: 2
Sorry, but that's not how it works. If each unit takes two cycles, then accessing those units in parallel won't magically decrease access time per unit to one cycle.

Adding units in parallel can sometimes not increase access times. If there are sync issues (and there usually are) it increases them (explaining why RAID access times are so much longer than a single disk).

But it *never* decreases access times monotonically. Not unless the underlying units themselves are somehow modified, which has nothing to do with the "parallel" implication itself.

RE: Bandwidth?
By geddarkstorm on 1/17/2008 12:27:26 PM , Rating: 4
From what I see, you are both right and wrong--in that you are looking at it from different perspectives and ignoring each other.

For instance, Masher is right in that for a single unit to do a single operation, the latency hasn't decreased with this method (but may increase a little for synchronization with other units). However Ninjit is also right in that BULK latency, from our perspective, HAS decreased, and substantially.

Going back to your example Masher, if 10 people do 10 calls each, for any one person it takes the same time to individually do 10 calls as it would one person doing 10 calls. However, in bulk you've done 100 calls (10 people times 10 calls) in the space of time it takes only to do 10 calls. So then, if you have 10 parallel threads, you've increased your speed of data flow times 10. Each individual unit still has the same latency, but because there's so many going on at once, the perceived bulk latency to do 100 operations has now been vastly reduced. That's the beauty of parallel.

RE: Bandwidth?
By masher2 on 1/17/2008 1:30:38 PM , Rating: 3
You're confusing latency with bandwidth. They're two different:

What you call "perceived bulk latency" is bandwidth. In general, parallel operation increases that in a linear function of the number of units.

Latency, however, is not decreased by parallel operation.

RE: Bandwidth?
By Ringold on 1/17/2008 8:27:32 PM , Rating: 3
"Latency (Engineering)"

Interesting.. field-specific definition entries.

I think I should add a "Latency (Economics/Applied Mathematics)" wiki; The delay from the time one looks at an equation with a blank stare and the time that person leaves to get coffee.

RE: Bandwidth?
By Sulphademus on 1/18/2008 8:58:31 AM , Rating: 2
To make another analogy:

Lets say you are delivering packages.

Latency is the speed limit you can drive.

Bandwidth is whether you're driving a pickup or a semi.

RE: Bandwidth?
By Spivonious on 1/25/2008 9:36:01 AM , Rating: 2
Ah, but what if you have 10 pickups all going to different places? Even if your semi can hold 10 pickups worth of packages, you still can't deliver them faster than the 10 pickups could.

RE: Bandwidth?
By Gentleman on 1/17/2008 7:06:09 PM , Rating: 2
Parallel bus has lower setup time compare to serial bus, but serial bus has significantly higher bandwidth because it is not affected by clock skew.

It sounds like this guy developed a new controller method that allows multiple simutaneous access to memory. This has higher setup time (latency), and higher bandwidth. This could potentially reduce the performance of single process but would increase the performance of multiprocesses. I would imagine that this new memory architecture would require a new bus as well.

RE: Bandwidth?
By Fnoob on 1/17/2008 10:56:39 AM , Rating: 2
If DDR2 is limited to 12Gb/s and DDR3 is rated to 25Gb/s - why then aren't we seeing any performance advantages of DDR3 over DDR2... besides the bragging rights that come with spending 3x as much? If memory bandwidth and/or latency has been the bottleneck recently, I would think a doubling from DDR2 to DDR3 would have at least brought about more than merely marginal differences.

RE: Bandwidth?
By darkpaw on 1/17/2008 11:09:20 AM , Rating: 2
Ah, but the real bottleneck is the system bus not the memory itself. That is why there is currently no real improvement.

It doesn't matter how fast the memory is if the CPU can't communicate with it quick enough or if the CPU can't make use of the amount of data that can be transferred.

RE: Bandwidth?
By scrapsma54 on 1/17/2008 1:53:07 PM , Rating: 2
Bandwidth is only theoretical. It can only reach its theoretical bandwidth under certain cicumstances, such as A 64 bit Os and a 64-bit processor and pipeline. Every code has to be writen in 64-bits per block. Just like the HD 2900, even though its theoretical band is impressive the blocks are not being processed fast enough for the processor to request more data. Think of boat (memory) and its carrying data to a foreign country (very fast) and reaches its destination, but imagine the next boat has to wait for that boat to finish unloading and load up and leave. Depending on how fast the workers (processors)load and unload and get the boat going, then the next boat can set sail. It seems that Intel has implemented that idea into core. Because of latency, DDR3 is going to have a slow start.

RE: Bandwidth?
By mathew7 on 1/18/2008 7:49:49 AM , Rating: 2
You don't need 64-bit to reach the theoretical bandwidth. The bits are changed at the same speed. The difference is only in overhead, because in 64-bit mode you have not only larger registers, but also more. But you can obtain a bus bottleneck if you do only sums on a large set of data even on 32-bit systems (which can be done using SSE). Of couse, if your processor can process those "bits" fast enough.
And even DDR2 had a slow start after DDR, which also had a slow start after "SD-RAM".
Although "latency" is used in memory relation as the time between command and start of data transfer, the real latency also contains the transfer time (I don't think the CPU cache lets you retrieve bytes before the whole line is fetched). And this is why DDR3 needs higher frequencies to shine: it's higher "start latency" needs to be compensated by burst time.

RE: Bandwidth?
By PandaBear on 1/17/2008 3:56:25 PM , Rating: 2
We already have that, it is called bank interleaves.

The draw back is you now have very large minimum transfer size, and that can kill latency.

And mixing logics (need low capacitance) and memory (need high capacitance) will make production, yield, power, and speed difficult to do well. I bet he never figure out why people don't just integrate DRAM into ASIC, or has never worked in devices or processes side of the semiconductor business.

A sign of the times...
By Goty on 1/17/2008 11:55:17 AM , Rating: 2
I guess it's a sign of the times that Intel is now mentioned first when talking about multi-core processors when it was AMD who really paved the way for them with K8 in the consumer and server spaces.

RE: A sign of the times...
By PB PM on 1/17/2008 12:34:08 PM , Rating: 2
True, but if you want to be picky, the IBM Power 5 did that first (almost a year ahead of Dual Core K8s, at least in the server space).

In any case the idea of dual core memory could be just as useful as dual core CPUs have proven to be. One thing I don't understand is that speed of communication between the CPU and memory has not kept up with the speed of the CPU. Honestly though, isn't the biggest bottleneck on modern PCs the HDD?

RE: A sign of the times...
By murphyslabrat on 1/17/2008 1:20:13 PM , Rating: 2
Yes, and that has the potential of changing now, what with SSD's and all.

However, I think it's all a temporary fix, and that at best it will stick around for 10 years. With quantum computing now entering infancy in the market, as opposed to its former place in pure speculation and experimenting, it's only a matter of time before our punch-card analogous electrical circuitry is replaced by the future.

Maybe this kind of thing will assist with photon-signal processing, but I don't see it being useful beyond that.

RE: A sign of the times...
By masher2 on 1/17/2008 2:08:15 PM , Rating: 2
It's going to be quite a bit longer than 10 years before quantum computing enters the mainstream, if ever. We don't even have stochastic versions of most major algorithms, making it impossible at present to even write mainstream software for a quantum machine, even were the hardware available.

RE: A sign of the times...
By PandaBear on 1/17/2008 4:01:31 PM , Rating: 2
Memories are already multi core. The only reason they keep them separate is yield and cost.

RE: A sign of the times...
By Sulphademus on 1/18/2008 8:50:41 AM , Rating: 2
One thing I don't understand is that speed of communication between the CPU and memory has not kept up with the speed of the CPU. Honestly though, isn't the biggest bottleneck on modern PCs the HDD?

While not perfect, wasn't the onchip memory controller and hypertransport link of the Athlon 64/Phenom supposed to help here specifically?

As to the hard drive, yeah, they're still ALOT slower than RAM, though I have noticed that Vista does like to preload of an awful lot of stuff into memory. (Seems to target 50% utilization.) So if the OS starts making memory predictions, much like CPUs have done for so long, and pulling this data over to RAM during downtime, this could speed up things some. Or maybe just enough to cover how much heavier and OS Vista is?

By phattyboombatty on 1/17/2008 10:20:12 AM , Rating: 1
Ashwood does admit to a couple downsides to his design. The first is that his design is paper only, though it was independently verified by researchers from Carnegie Mellon University.

The fact that the design is only a concept at the moment is not a "drawback" or "downside" to the design.

RE: Drawback?
By TomZ on 1/17/2008 10:35:05 AM , Rating: 2
Well, going from theory to implementation in real products is not without challenges. For example, will this "new" approach be economically viable relative to alternative implementations?

RE: Drawback?
By phattyboombatty on 1/17/2008 10:42:41 AM , Rating: 3
If the design is determined to be difficult to implement or very expensive to implement, those would be drawbacks. But an idea is not a drawback simply because it is only an idea. When people use the term "drawback" in reference to a particular design, they are generally referring to the negatives encountered once the design is fully implemented or the negatives encountered in actually implementing the design.

RE: Drawback?
By Black69ta on 1/17/2008 5:24:50 PM , Rating: 2
Isn't this similar to the idea behind dual channel memory? Parallel Memory Banks to increase bandwidth? Hasn't Intel already hinted at Quad Channel DDR3 in the enthusiast class Nehalem? Doesn't that accomplish the same thing without increased manufacturing costs?

Multi-core SSD drives?
By wingless on 1/17/2008 8:46:19 PM , Rating: 2
This technology could make solid state disk drives a helluva lot faster. It would be amazing to have a 1TB SSD drive that can access at several hundred Mb/s alone.

"We can't expect users to use common sense. That would eliminate the need for all sorts of legislation, committees, oversight and lawyers." -- Christopher Jennings
Related Articles
Engineers Explain 45nm Delays, Errata
January 16, 2008, 10:32 AM

Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki