backtop


Print 7 comment(s) - last by BitByBit.. on Aug 29 at 12:18 PM

Clearing up a little confusion

Several weeks ago, I wrote an article about the K8L architecture during AMD's analyst day.  In the article, I stated the following:

Additionally, Hester revealed some more information about the cache specifics on K8L.  Each K8L core will have 64KB of dedicated L1 cache, followed by 512KB of dedicated L2 cache.  The base models of K8L will have 2MB of shared L3 cache, but Hester also went on to claim that adding more L3 cache was in the company’s roadmap.  One thing AMD representatives have not particularly touched on is the cache reduction from 64+64KB (data+instruction) to 32+32KB.  AMD employees have assured us this move is logical with the addition of L3 cache. 

This is not correct, and I had unfortunately misinterpreted slides from AMD, and there was some confusion with details that were released to the public as well.  AMD representatives have contacted me claiming "Haven't got a definitive answer yet, but I'd be very surprised if anyone has changed the L1 cache sizes which has been 64KB data + 64KB instruction since we first introduced K7 way back in the old days."  The L2 and L3 cache details are still the same. I apologize for any confusion this may have caused.

Secondly, I would like to address the AMD Optimization Utility article we posted on a few days ago.  In this article, we claimed:

The AMD Dual-Core Optimizer utility could possibly be the rumored "Reverse Hyper-Threading" patch that would improve single thread application and games performance by having dual-core processors show up as a single core processor in applications. AMD has declined to comment on "Reverse Hyper-Threading."

Just to emphasize this, DailyTech does not believe the utility is the precursor to RHT -- in fact we've not seen any shred of evidence that would suggest RHT even exists.  My best guess is that someone saw the utility before AMD officially announced it, and several bad translations later it morphed into RHT, hence the rumors.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

RHT=Pointless
By Trisped on 7/10/2006 1:50:19 PM , Rating: 3
An RHT system would have little to no usefulness. The way desktop processors are set up now they have what is called a pipeline. (The details are still shaky to me, as this might just apply to the integer arithmetic unit, or all computations.)

This pipeline helps speed up the processor by dividing a process into many small steps. Each stage takes one full processor cycle before moving on. The advantage is a 1 stage pipeline that operates at 143MHz
can be turned into a 21 stage pipeline that operates at 3GHz.

The problem is that if you need the results of one process before you can start the second you have to wait all 21 cycles for it to leave the pipe. To prevent that, a preprocessor guesses, re-orders, and stream lines commands. In order to get the full benefit of the pipelines you still need every command to be separated from any dependent commands (dependent commands use the results of a previous command as an argument) by 21 commands.

From a programmer’s stand point, this is good and bad. Your programs do run faster, but there are unused cycles that could have made the program run faster. A preprocessor helps, and you do get faster computations, but there are still going to be wasted cycles. From a PR perspective it is a Gold mine. This is what Intel did with their last set of P4 processors, buy bumping up the pipeline to 31 stages they could make a 2.1GHz processor that ran at 3.1GHz and since 3.1 is way bigger then 2.1 everyone buys the bigger number.

The actual benefit of the additional 10 stages doesn't exist. Intel knew this and introduced Hyper Threading. This basically split the pipeline and CPU clock in half, giving half to one virtual processor and half to the other. It still takes the same amount of time to run a process, but you can be working on two different threads at a time. Since threads are basically independent, you are much more likely to fill the pipeline, there by maximizing your processing power.

Now lets turn to the RHT idea. I believe AMD is still using a 21 stage pipeline. That means a perfect RHT setup will combine the two processors to create one that runs at twice the speed with twice the pipeline (21*2=42 stages). If a 31 stage pipeline didn't efficient use, what makes you think a 42 stage pipeline will do any better? In addition, most programs that would run well are either easily or stupidly thread able (stupidly as in any idiot could thread it). With the recent tado about multi threaded applications, anything that can be threaded has been.

Since an RHT would not be perfect, you can expect some overhead. The result is RHT would be less effective anywhere a properly threaded application is present, and present little to no benefit (and will probably hurt performance) to any single threaded application.




RE: RHT=Pointless
By peternelson on 7/12/2006 7:40:13 PM , Rating: 2

It's not pointless at all.

You appear to make some erroneous assumptions about cpu architecture so I am sceptical of your view on RHT.

Do research again how cpus work. Then have a go at designing one in hardware (how about a multicore). Then you may have some understanding on this issue.

What AMD are trying to do is advanced, otherwise it would have already been done.

AMD RHT is not the same as Intel's "Hyperthreading". They are different concepts.


RE: RHT=Pointless
By Trisped on 7/14/2006 3:11:09 PM , Rating: 2
quote:
It's not pointless at all.

You appear to make some erroneous assumptions about cpu architecture so I am sceptical of your view on RHT.

Do research again how cpus work. Then have a go at designing one in hardware (how about a multicore). Then you may have some understanding on this issue.
I do not claim to be an expert, but if I am wrong please point it out, as the previous 3 years of research are what I have submitted, not the vain fantasy of a high school fan boy. I have presented it from my point of view, as a programmer, added by the attacks on Intel’s P4 and their poor performance by review sites (where they complained about the extremely deep pipe line).
quote:
What AMD are trying to do is advanced, otherwise it would have already been done.
Just because it has never been done doesn't mean it is advanced, useful, or important. Have you ever seen a working toilet in a cubical before (flushing water, sink, soap, TP, porcelain)? Or how about quick decay paper that lasts for about a month before returning to dirt?
quote:
AMD RHT is not the same as Intel's "Hyperthreading". They are different concepts.
If you read my previous post, you will see that I did not say they were. I attempted to explain what RHT would be by explaining what HT is and how it would be reversed (thus RHT or Reversed Hyper Threading). HT has been proven to be useful, as I am told many PowerPC chips used in Macs did the same thing (though under a different name).

All that aside, if you have correct information about a CPU that I miss stated please let me know. Also, if you know a real reason (preferably applicable to a large majority of computer users) please reply.

If you think the major goal of RHT is to allow programs that don't work well with multi cores run on only one core, you get the same effect buy using R.O.P.E*. The real solution to this problem is not RHT, but MS updating Windows code to allow specification of which core to execute programs on. Of course it would also be nice if Microsoft would include complete 64 bit support with proper 32 and 16 big mode execution, but they don't seem too concerned with that yet.


RE: RHT=Pointless
By BitByBit on 8/29/2006 12:18:22 PM , Rating: 2
An RHT system would have little to no usefulness. The way desktop processors are set up now they have what is called a pipeline. (The details are still shaky to me, as this might just apply to the integer arithmetic unit, or all computations.)

We'll let the processor architects decide that. But the general idea (if it did at all originate from AMD) seems sound enough to me; by somehow splitting a thread in two and sending each half to a different core, throughput could be increased dramatically.

The problem is that if you need the results of one process before you can start the second you have to wait all 21 cycles for it to leave the pipe. To prevent that, a preprocessor guesses, re-orders, and stream lines commands. In order to get the full benefit of the pipelines you still need every command to be separated from any dependent commands (dependent commands use the results of a previous command as an argument) by 21 commands.


Inter-dependent instructions do not need to be separated by the entire pipeline length. If we have the following instructions:

1: c = a + b
2: c = c * c

then (2) must wait for the result of (1) before it can execute. This does not however, take the entire number of pipeline stages to execute - just the number of cycles it takes to execute instructions.
Now higher-clocked deeper-pipelined processors are not any more adversely affected by this than lower-clocked shorter-pipelined processors. It may take more cycles on the deeper-pipelined processor, but this should result in the same amount of execution time.

The actual benefit of the additional 10 stages doesn't exist. Intel knew this and introduced Hyper Threading. This basically split the pipeline and CPU clock in half, giving half to one virtual processor and half to the other. It still takes the same amount of time to run a process, but you can be working on two different threads at a time. Since threads are basically independent, you are much more likely to fill the pipeline, there by maximizing your processing power.


The point of the additional 11 stages introduced with Prescott was to allow Netburst to scale to very high clock speeds. However, the much documented transistor leakage and thermal issues prevented this, meaning Prescott didn't clock that much higher than Northwood. There seems to be a widepsread misconception that pipeline length is increased at the expense of IPC. While it is true that increasing the pipeline length decreases clock-for-clock efficiency and therefore lowers IPC, there is no direct relationship between the two. In other words, doubling pipeline length does not half IPC - not by a long shot.
Another misconception is that Hyperthreading was introduced to keep deep pipelines full, and is only beneficial to such processors. This is entirely false, as the benefit of SMT is only apparent in the execution engine, where the schedulers must find independent instructions it can send to be executed in parallel. Instructions from two threads are naturally not going to share depencies, which means they can be executed in parallel.
Indeed, the greatest gains from SMT are found in very 'wide' architectures.

Now lets turn to the RHT idea. I believe AMD is still using a 21 stage pipeline. That means a perfect RHT setup will combine the two processors to create one that runs at twice the speed with twice the pipeline (21*2=42 stages). If a 31 stage pipeline didn't efficient use, what makes you think a 42 stage pipeline will do any better? In addition, most programs that would run well are either easily or stupidly thread able (stupidly as in any idiot could thread it). With the recent tado about multi threaded applications, anything that can be threaded has been.

This simply wasn't the idea behind RHT. How two cores could be combined as one to run at twice the clock speed is probably something not even engineers at AMD or Intel could ever make happen.
The idea behind RHT, from what I've gathered, is to split one thread into two, and then execute both threads on different cores, as discussed above. If one thread is X instructions long, and one core can execute that thread in T seconds, RHT would theoretically allow each core to process X/2 instructions, taking T/2 seconds.
If this is even possible however, I doubt we will see such technology for some years to come. AMD did mention the idea of cores sharing execution resources (it's somewhere on AT) but this seems more akin to conventional SMT than to RHT.


RHT is fact or fiction
By nerdboy on 7/6/2006 4:40:58 PM , Rating: 2
I have heard a lot or rumors about Reverse Hyper-Threading lately. If AMD could pull this off it would be amazing. The Conroe would actually have some competition, but if it is not then AMD is going to have a hard time with keeping the top performing processor so they have to do something. I mean it sounds like there working on something like it because there are so many different rumors about it. It doesn’t help that every time someone mentions RHT they say “No Comment”




RE: RHT is fact or fiction
By AnnihilatorX on 7/7/2006 4:08:06 AM , Rating: 2
No comment is better than decling

They don't want Intel do do anything rash now ;)


RE: RHT is fact or fiction
By stmok on 7/7/2006 10:45:26 PM , Rating: 3
RHT is gonna be a surprise. :-)


“So far we have not seen a single Android device that does not infringe on our patents." -- Microsoft General Counsel Brad Smith

Related Articles
















botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki