backtop


Print 84 comment(s) - last by saratoga.. on Feb 2 at 6:27 PM

More "Penryn" details emerge

Despite the plethora of attention Penryn received over the last few weeks, Intel's newest roadmaps put the processor launch for Q1'08.  This indicates the launch has not necessarily accelerated even though the initial tape-out proved extremely successful.

On the other hand, Intel's 2008 roadmap shows every segment simultaneously deploying 45nm products.  Like AMD's recent 65nm Brisbane launch, Intel guidance notes the processors will start shipping Q4'07 but the actual launch will come as a coordinated 2008 event.

The first Intel 45nm treatments will come from the quad-core Yorkfield and dual-core Wolfdale desktop processors.  Wolfdale has two physical cores on a single die and up to 6MB of L2 cache.  Yorkfield is then two Wolfdale dice on a single package. Also worth noting: Wolfdale ships with a 1333MHz front-side bus and Yorkfield ships with a 1066MHz front-side bus.  Chipset support will largely come from Bearlake-family that was previously disclosed on DailyTech.

Perhaps the most interesting thing about these two processors is the return of Hyper-Threading.  This, however, does not mean that Yorkfield will appear as eight logical cores, nor does it mean Wolfdale will appear as four logical cores. Intel's internal guidance on the subject specifically claims the processor will ship with Hyper-Threading, but will only utilize 4 threads.  On every Intel roadmap in the past, Hyper-Threading doubles the amount of listed threads in the guidance documentation.  Clearly, there is more of a mystery here still.  (Update: Please read the retraction below.)

"The official company policy is that our engineers have left the door open for Hyper-Threading, but we cannot confirm or deny any future plans for the technology," adds Intel Public Relations Manager Dan Snyder.

All Penryn cores also include Intel TXT, previously known as Intel LaGrande Technology.  TXT stands for Trusted Execution Technology and refers to the collection of devices.  The Trusted Platform Module, or TPM, is one component. DMA page protection is another. 

Alas, even if 2008 seems like a long time away for the 45nm platform, it's important to note that all Intel platforms will have 45nm SKUs in Q1'08.  Penryn, the family name for Intel's first generation 45nm consumer CPUs, also refers specifically to the 45nm dual-core mobile CPU.  Intel's current roadmap claims this processor will lead the Q1'08 mobile push with several low voltage models coming one quarter later.

For servers, Wolfdale will make an appearance as a dual and single socket Xeon.  It's been long-standing Intel policy to separate desktop, mobile and server chipsets into different products; Conroe was the Core 2 desktop CPU and Woodcrest, though physically nearly identical, was the Xeon counterpart.  Wolfdale as a server and a desktop CPU indicates the chips are electrically identical -- though each will likely receive different packaging for the different sockets. 

Yorkfield will not receive the same codenaming treatment as Wolfdale on the server. Instead, Harpertown will be the quad-core Xeon for two socket servers.  Yorkfield will still be the company’s single-socket quad-core Xeon offering.

Update 01/31/2007:  Channel sources have reached out to DailyTech to emphasize that the addition of Hyper-Threading to Penryn-family processors in 2008 is incorrect and the result of dated channel data.  My feelings and thoughts about the retraction can be read on my blog.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Until there is proper software, HT is over rated
By Beenthere on 1/30/2007 8:03:06 PM , Rating: -1
Unless you have a supporting O/S and properly written software HT is all marketing hype. As previously demonstrated, HT on Intel chips would slow most software performance until it was disabled.




RE: Until there is proper software, HT is over rated
By rqle on 1/30/2007 8:12:36 PM , Rating: 2
it does also help on many as well, with the option to disable it if you dont like it. free is better than none imo.


By Viditor on 1/30/2007 8:35:00 PM , Rating: 2
On a multicore chip, I can't see HT doing very much for performance until CSI and Nehalem is released in 2008/9 (remember that the bottleneck nere is the FSB).

On another note though, it's nice that Intel has finally put the wild speculation about an early Penryn release to bed...

Intel's newest roadmaps put the processor launch for Q1'08. This indicates the launch has not necessarily accelerated even though the initial tape-out proved extremely successful


By saratoga on 1/30/2007 8:38:53 PM , Rating: 2
One of the nice things about HT is that it can help hide slow memory, provided you've got enough cache. Look at what MS/IBM did with the Xenon CPU on the Xbox360. Tons of higher latency memory, combined with P4-like clock speed, but very well done cache and HT to help hide that.


By Viditor on 1/30/2007 9:00:42 PM , Rating: 3
quote:
One of the nice things about HT is that it can help hide slow memory, provided you've got enough cache


But C2D's more intelligent prefetch mechanism and memory disambiguation (along with the huge cache) have already accomplished this quite well. I don't see HT making any improvements here...


By saratoga on 1/30/2007 9:33:30 PM , Rating: 2
You're kidding right? You really think Conroe has solved the memory latency problem? It's got good OOOE, but still no where near good enough to cover missing an L2 cache line. Not to mention the issue of ILP limitations on a wide core like that. Theres still enormous room for improvement.


By Viditor on 1/30/2007 10:10:28 PM , Rating: 2
Agreed, but I don't see where HT will help with those issues...
Could you elaborate?


By saratoga on 1/31/2007 3:20:18 PM , Rating: 2
HT lets you fill in unused issue width with OPs from another thread. So if you stall waiting on a cache miss (or even a cache hit) you can continue to work on another thread rather then letting the CPU grind to a halt for 10-100 clock cycles.


By Viditor on 1/31/2007 9:48:08 PM , Rating: 3
quote:
HT lets you fill in unused issue width with OPs from another thread. So if you stall waiting on a cache miss (or even a cache hit) you can continue to work on another thread rather then letting the CPU grind to a halt for 10-100 clock cycles


True...and in a single core environment this would (and is) usually a very good thing. But the scheduler (and apps) sees the HT virtual core as just another core. So if you are running 2 simultaneous threads, there is no method by which you can direct them to the actual core instead of the virtual core. I'm sure that you'd agree that only using a single core efficiently instead of splitting the work between 2 cores kind of defeats the purpose of dual (or more significantly quad) core chips.
It's true that you can write affinity code for the OS, but the applications all need to know what that means as well...and that's just not the case at the moment.

In a nutshell, in this case the whole is not the sum of it's parts...:)


By intangir on 2/1/2007 6:40:23 PM , Rating: 3
The applications don't need to know. Even the process scheduler in Windows XP SP2 will automatically prioritize thread assignments taking into account whether the core is already occupied. Where's the problem?


By saratoga on 2/2/2007 6:13:24 PM , Rating: 2
quote:
True...and in a single core environment this would (and is) usually a very good thing. But the scheduler (and apps) sees the HT virtual core as just another core. So if you are running 2 simultaneous threads, there is no method by which you can direct them to the actual core instead of the virtual core.


Actually I think Vista can do this, however its not really all that useful.

quote:
I'm sure that you'd agree that only using a single core efficiently instead of splitting the work between 2 cores kind of defeats the purpose of dual (or more significantly quad) core chips.


Absolutely not. If you have 8 threads available on a quad chip, you sure as hell want to issue all 8. See my point above. Having more threads means less pain when you have to hit main memory, or when you mispredict a branch. Essentially, if you want to use a wide/deep core efficiently, you basically need to have SMT (as Sun, MS and IBM have shown with their products).

Intel may have botched their SMT with HT thanks to the crappiness of the P4, but thats not generally the case, which is why so many new cores coming out have SMT.



By scrapsma54 on 1/31/2007 6:56:49 PM , Rating: 3
Ht is useful. Anybody who is computer science in knows that processor parallelism has always been the best road to take when developing a computer. HT is a simulation of parallelism. Ht takes 2 threads and splits them up. then the processor processes the code and alternating between each split. This allows near parallel operation and conserves cpu usage. With core 2 duo series the cpu usage is split and the second core processes the data. With hyper-threading, I would expect cpu usage when playing a game would be cut down 1/3 of what it would do without it.


By scrapsma54 on 1/31/2007 7:02:58 PM , Rating: 2
But anyway, the whole hyperthreading will probably never happen since it was incorrect information.


By Thorburn on 2/1/2007 2:41:27 PM , Rating: 2
Not on Core, but it should be back with Nehelam (sp)


By Viditor on 1/31/2007 9:52:13 PM , Rating: 2
quote:
Ht is useful. Anybody who is computer science in knows that processor parallelism has always been the best road to take when developing a computer

I don't think anyone disputes that...
It's just not useful in ALL circumstances. In this case, it's a question of comparitive efficiencies. Is it better to have more efficient individual cores, or is it better to have the multicore chip be more efficient as a whole?


By intangir on 2/1/2007 6:32:12 PM , Rating: 3
False dilemma. There is no reason you cannot have both. Having both is better than having either in isolation.


By SacredFist on 1/30/2007 10:09:35 PM , Rating: 4
Problem was, Windows didn't know core 1 and core 2 were technically on the same core. So instead of assigning threads to a different core like Core 1 Core 3 THEN core 2 for HT, it crammed in two threads on a loaded processor leaving one unused.


By Viditor on 1/30/2007 10:18:01 PM , Rating: 2
quote:
Problem was, Windows didn't know core 1 and core 2 were technically on the same core. So instead of assigning threads to a different core like Core 1 Core 3 THEN core 2 for HT, it crammed in two threads on a loaded processor leaving one unused

That's it exactly...
In TomZ's case, he's running so many threads that it really doesn't matter that much and can enhance performance...but for average use, HT can slow things down in a MC environment.


By intangir on 2/2/2007 11:34:37 AM , Rating: 3
If you're running a modern Windows, that should no longer be a problem. From a May 2003 Microsoft whitepaper:

quote:
To take advantage of this performance opportunity, the scheduler in the Windows Server 2003 family and Windows XP has been modified to identify HT processors and to favor dispatching threads onto inactive physical processors wherever possible.


http://www.microsoft.com/whdc/system/CEC/HT-Window...


RE: Until there is proper software, HT is over rated
By Phynaz on 1/31/2007 11:51:06 AM , Rating: 3
quote:
On a multicore chip, I can't see HT doing very much for performance until CSI and Nehalem is released in 2008/9 (remember that the bottleneck nere is the FSB).


Sigh...

I'ts been shown again and again that there is no FSB bottleneck.

It would be really refreshing if you quit spreading your misinformation.


By Griswold on 1/31/2007 2:08:55 PM , Rating: 2
In multi-socket systems, thanks to cache coherency traffic that has to go over FSB, there seems to be your non-existant bottleneck. It materializes in mediocre scalability.

See also:
http://www.anandtech.com/IT/showdoc.aspx?i=2897&p=...

This may not be relevant to somebody like you, but to others it is.


RE: Until there is proper software, HT is over rated
By Phynaz on 1/31/2007 2:13:12 PM , Rating: 2
Backup what you are saying. State how much bandwidth cache coherency traffic takes up on the FSB.


By saratoga on 1/31/2007 3:23:30 PM , Rating: 2
quote:
Backup what you are saying.


He did . . .

Let me guess, you didn't actually read the link?

quote:
State how much bandwidth cache coherency traffic takes up on the FSB.


That depends on the load chosen. Theres no one number, so your question doesn't even have a specific answer. You might as well ask how much memory bandwidth is enough, or many cores benchmarks use and how much cache is idea. You need to define your load, since the answer depends on what you're doing.


By Viditor on 1/31/2007 8:55:06 PM , Rating: 2
Phynaz, I don't know if you realize it but you've asked this question before in another thread.
There, we were talking about scalability...
Here was my reply to your request for proof on that thread:

"Here is a first indication that quad core Xeon does not scale as well as the other systems. Two 2.4GHz Opteron 880 processors are as fast as one Xeon 5345, but four Opterons outperform the dual quad core Xeon by 16%. In other words, the quad Opteron system scales 31% better than the Xeon system"
http://tinyurl.com/2cgnj8

Please note that as you add cores and sockets that use the FSB, in other words scale, the relative performance of the Opteron gains dramatically.
If you think about it you'd realize that since the cores haven't changed, only the load on the FSB could account for this.


By coldpower27 on 2/1/2007 9:03:55 AM , Rating: 2
Interesting, this problem doesn't seem to manifest itself till you reach the 2P System with Dual Clovertown's.

Now I can understand how AMD can claim the 40% improvement over Clovertown, with Barcelona "in a wide variety of workloads".

It will be interesting to see how much this increase diminishes, assuming AMD's numbers are currently correct for Bareclona vs Clovertown performance, when you make the comparison with the Agena Quad Core to the Kentsfield Quad in Single Socket Desktop systems.

The FSB issue isn't a issue on a Single Socket, however it seems Clovertown suffers from poorer scaling as you increase the number of Sockets.

So overall is the FSB an issue, not on the Single Socket Arena, when you are talking 2 Sockets or more there is something that is weakening Clovertown's scaling ability. It could also be because Clovertown has to work with FB-DIMM technology which is of higher latency compared to Unbuffered DDR2 on the desktop.


By Viditor on 2/1/2007 12:33:32 PM , Rating: 2
Good points CP.
I agree that AMD probably chose Cloverton as a comparison very carefully, and 40% really isn't unbeleivable on this specific comparison .
One thing I've been saying all along is that if the K10 core is only equivalent to C2D, then AMD will have a much better spec because of the platform. Of course I was most incomplete in my comments...
I should have qualified that I was speaking of servers...mainly because of the scaling.

As to FBDs being an issue, there is something to that...but it seems to me that the latency doesn't come close to accounting for the large difference in scaling.
I am still wondering why Intel went with a high latency/high bandwidth model for memory instead of a low latency/low bandwidth one...


By Phynaz on 2/1/2007 1:14:59 PM , Rating: 2
K10?

Ummmm....This is K8L we're talking about, not some fanciful chip that's always five years away.


By Viditor on 2/1/2007 1:30:11 PM , Rating: 2
quote:
Ummmm....This is K8L we're talking about, not some fanciful chip that's always five years away


Ummm...there is no K8L. The next-gen chip coming out next quarter (Barcelona) is a K10 chip...

"Again, AMD has explicitly told me its native quad-core chips will be K10, not K8. That's from their Technical Director - Sales and Marketing EMEA, so isn't likely to be wrong"
http://forums.hexus.net/showthread.php?t=92137&pag...

But a "Rose by any other name"...let's just call it Barcelona.


By Phynaz on 2/1/2007 10:33:07 AM , Rating: 2
Yes, I have asked this question before.

I hardly consider one article from Anandtech to be proof of anything.

For example, where in the article is empirical evidence that the FSB has become saturated or is a bottleneck?

Answer: There isn't any, it's all conjecture.


By Viditor on 2/1/2007 12:17:51 PM , Rating: 2
quote:
where in the article is empirical evidence that the FSB has become saturated or is a bottleneck?
Answer: There isn't any, it's all conjecture


Well no actually, it's a hypothesis backed by scientific data that corroborates the conclusion.

It might be helpful if instead of us taking your word for it, you could offer some indication of substance for your assertion that:
"I'ts been shown again and again that there is no FSB bottleneck"


By Phynaz on 2/1/2007 1:13:30 PM , Rating: 2
Yeah, prove a negative, that will work.

Please point me to this scientific evidence you speak about.

Even though I'm not the one making the statements (you are), I'll give you my evidence.

Using Intel analysis tools, running business applications, I see FSB utilization in the 15% percent area on a dual core HP system. Running the same on a quad bumps the utilization to 18%-20%. Business applications run out of cpu long before then run out of FSB bandwidth.



By Viditor on 2/1/2007 1:38:14 PM , Rating: 2
quote:
Yeah, prove a negative, that will work

Sigh...it's not proving a negative, it's demonstrating that data throughput on a FSB modeled system is equivalent to an HT modeled system.

I remind you that you are the one who said:

"I'ts been shown again and again that there is no FSB bottleneck"

All I'm asking for is for you to show that...seems reasonable to me.


By saratoga on 2/2/2007 6:24:55 PM , Rating: 2
quote:
Please point me to this scientific evidence you speak about.


What kind of logic is this? You said there was evidence, so you provide it. Don't say it exists and then expect other people to find it for you.

quote:
Even though I'm not the one making the statements (you are)


You mean "making claims". This whole post of yours is a statement.

Anyway, remember when you posted this:

"I'ts been shown again and again that there is no FSB bottleneck. "

So you did make a claim. Now back it up or retract it.

quote:
Using Intel analysis tools, running business applications, I see FSB utilization in the 15% percent area on a dual core HP system. Running the same on a quad bumps the utilization to 18%-20%. Business applications run out of cpu long before then run out of FSB bandwidth.


All that proves is that your specific business app isn't constrained. No one even doubted that. Rather, your claim that the Core 2 was not limited on all workloads is in doubt. Unless you've some reason to think its relevant to the rest of the world, theres no sense in even mentioning it.


By saratoga on 1/30/2007 8:37:10 PM , Rating: 2
quote:
Unless you have a supporting O/S and properly written software HT is all marketing hype.


HT just requires SMP support, and with dual core now common, support is quite good.

quote:
As previously demonstrated, HT on Intel chips would slow most software performance until it was disabled.


Only a few things, and that was a flaw of the P4, not HT in general. Conroe is much better suited for HT then the P4 ever was. Its actually got enough cache to keep two threads fed (lol @ 8KB split between two threads on Northwood).


RE: Until there is proper software, HT is over rated
By TomZ on 1/30/2007 9:04:44 PM , Rating: 2
quote:
Unless you have a supporting O/S and properly written software HT is all marketing hype. As previously demonstrated, HT on Intel chips would slow most software performance until it was disabled.

Wrong. HT is useful in keeping an application running at 100% CPU from hogging the entire CPU. In my experience, that is more valuable than the few points of performance that is lost from having it enabled.


By Viditor on 1/30/2007 9:10:00 PM , Rating: 2
quote:
HT is useful in keeping an application running at 100% CPU from hogging the entire CPU

But how is this going to help in a multicore environment, especially on C2D?
I look forward to the benches, but at a guess I'd say that HT will be much less useful on a C2D chip...and keeping it disabled makes a bit more sense


By Nehemoth on 1/30/2007 9:29:06 PM , Rating: 2
Because HT and Multi Cores don't compete each other if not that they're complement each other.

For example in this scenario, HT will help that those MC's will always in use, course as someone said above the problem here is the FSB, but anyway i believe that this FSB will be fine until CSI appears, if someday will see the light at last.


By Viditor on 1/30/2007 9:45:00 PM , Rating: 2
quote:
Because HT and Multi Cores don't compete each other if not that they're complement each other

HT allows you to start 2 threads simultaneously, though you only have the resources to complete one...
MC also allows you to start multiple threads, but you have the resources to complete them all.
Add to that the ability of C2D to retire 4 ops/clock (though this never really occurs), and the only time that HT would help is when you have more threads that need starting than you have cores to start them...I can't think of many instances where this occurs.

With a single core, HT makes a lot more sense and certainly improves efficiency...but nowhere near as efficiently as MC does.

The long and short of it is that HT and MC DO compete...
For example, how will the scheduler choose between an HT (virtual) core and an actual second core? The second core is faster because it can complete the thread on it's own while the HT virtual core cannot.
In this case, using HT would actually slow things down...


By LittleMic on 1/31/2007 3:15:17 AM , Rating: 2
quote:

HT allows you to start 2 threads simultaneously, though you only have the resources to complete one...

That is where you are wrong : a single P4 core had enough ressources for more than 1 thread. That's why we've seen situations where HT increased performance.

This is even more true for Conroe that has a wider core.


By Griswold on 1/31/2007 6:58:24 AM , Rating: 2
The long pipeline of netburst made HT viable, parts of it were idling while others were busy - HT was just means to increase the load of the chip and thus its efficiency.

Now, if they would implement true SMT instead of this halfassed netburst approach, that would shed a different light on it. Until then, I'm with viditor and the fact that intel only considers this as an option that may never be used, supports it.


RE: Until there is proper software, HT is over rated
By Phynaz on 1/31/2007 1:10:55 PM , Rating: 2
What are you talking about? True SMT? What the hell is that?


By Griswold on 1/31/2007 2:18:45 PM , Rating: 2
So, you're telling me that HT is of the same quality as SMT in a, lets say, Power5 series processor or a Sun T1? No, dont answer me.

Compared to these two examples, Intels HT is a whackjob. Its fallout from the netburst design - it was a logical consequence of the design, but not remotely close to a "true" SMT design such as the Power5 or T1.

And as we can see in the update, there is no such thing in Penryn - for good reasons.


By saratoga on 1/31/2007 3:47:21 PM , Rating: 2
quote:
The long pipeline of netburst made HT viable, parts of it were idling while others were busy - HT was just means to increase the load of the chip and thus its efficiency.


Length and width are what matters. A long pipeline, or a wide pipeline are both good for HT. A long, wide pipeline is best.

quote:
So, you're telling me that HT is of the same quality as SMT in a, lets say, Power5 series processor or a Sun T1?


Define "quality". Both are SMT, though the T1's is very different then the P4's (which was more like the POWER5's).

quote:
No, dont answer me.


Huh?

quote:
Compared to these two examples, Intels HT is a whackjob. Its fallout from the netburst design - it was a logical consequence of the design, but not remotely close to a "true" SMT design such as the Power5 or T1.


Thats nonsense. Also, what is a "true" SMT design? I get the impression that you don't really understand how HT is implemented in the P4, based on your comparison to the T1.


By Viditor on 1/31/2007 7:06:16 AM , Rating: 2
quote:
a single P4 core had enough ressources for more than 1 thread. That's why we've seen situations where HT increased performance


No, it increased performance because it increases the efficiency of the CPU. By starting 2 threads at the same time, HT allows the CPU to have a partially run thread in the pipe at all times so that it doesn't have to wait for work (so to speak). Please note that I said complete one thread...
The problem you run into with HT and a multicore chip is that the operating system can't distinguish between a virtual core and a real core...



By LittleMic on 1/31/2007 9:54:47 AM , Rating: 2
quote:

The problem you run into with HT and a multicore chip is that the operating system can't distinguish between a virtual core and a real core...

Windows 2003 and linux have been doing this for a moment.

http://www.intel.com/cd/ids/developer/asmo-na/eng/...


By Viditor on 1/31/2007 9:54:13 PM , Rating: 2
It's true that affinity coding for the OS is becoming more widespread...but you (I think) are forgetting that the applications must be coded for this affinity as well...


By saratoga on 2/2/2007 6:27:55 PM , Rating: 2
Actually, no application support is required. The scheduler alone is all that is needed (since it assigns CPUs, not the apps themselves).


RE: Until there is proper software, HT is over rated
By TomZ on 1/30/2007 9:36:23 PM , Rating: 2
I agree, that the benefit of HT is less clear for multi-core. But I also think the performance hit for having it enabled is also overstated. I have a Presler EE, which is dual-core with HT, and I have HT enabled (four virtual processors). But I also tend to run quite a few apps at a time.


By Viditor on 1/30/2007 9:46:52 PM , Rating: 2
quote:
But I also tend to run quite a few apps at a time

That makes much more sense...but have you tested your speed while running single thread with HT on or off?
Or more telling, how about 2 threads?


RE: Until there is proper software, HT is over rated
By TomZ on 1/31/2007 9:33:39 AM , Rating: 2
No, I haven't taken the time nor the initiative to do that. I would be curious about the results, however. Can you suggest a good way to test that?


By saratoga on 1/31/2007 3:50:06 PM , Rating: 2
Your EE has more cache. More cache == less problems for the P4.

Theres not much point in testing HT with one thread. On a truely single threaded OS, HT and no HT are exactly the same. You need to actually have 2+ threads running before HT can be used. Otherwise the HT logic is just idle and you effectively have a non-HT processor. Sort of like how a dual core CPU run on Windows 98 is basically a single core CPU.


By TomZ on 1/31/2007 5:12:05 PM , Rating: 2
Regarding testing with one thread, etc., I'm not sure I get your point. I'm running Vista, and probably have dozens of threads actively running, plus maybe another 100 or more that are idle.


By Viditor on 1/31/2007 9:56:56 PM , Rating: 2
quote:
Can you suggest a good way to test that?


A fair question, and I'm sad to say that at this moment I do not (though if you post in the Forums, I would bet that someone like Duvie or one of the other gurus there could give you a good answer).


By Lord 666 on 1/31/2007 12:06:07 AM , Rating: 2
Wow, TomZ only has a Presler EE?

Very surprised you haven't upgraded to a C2DE or even quad yet.

You've previously said you use your rig about 70 hours a week and for your income, the new series of processors surely could help.

Honestly, why haven't you upgraded yet?


RE: Until there is proper software, HT is over rated
By TomZ on 1/31/2007 9:39:34 AM , Rating: 2
I don't think that spending the time or money now would make much performance difference. My processor has a WEI of 5.2, with all the other components of my system having WEI scores of 5.7-5.9. A CPU upgrade would surely increase the CPU score a bit, but probably not enough to be worth the time and money. I'll wait another year probably.


RE: Until there is proper software, HT is over rated
By mino on 1/31/2007 7:16:38 PM , Rating: 2
Do you REALLY think the WEI index is meaningfull way to measure performance by any single sub-score? Not to mention to consider their relation to the final score?

Go, read some AT before posting. May help a bit ;) .


By mino on 1/31/2007 7:23:01 PM , Rating: 2
A friend of mine once said:
"Well, IMO the WEI will replace 3DMark as an universal e-penis measurement tool."

Seems he was right.


By TomZ on 1/31/2007 9:03:08 PM , Rating: 2
Are you disputing my conclusion, my methodology, or just giving me grief for the heck of it? Are you saying that you feel that the WEI CPU score is a poor indicator of CPU performance? What are you saying exactly?


RE: Until there is proper software, HT is over rated
By Dactyl on 1/30/2007 9:59:56 PM , Rating: 3
It's a shame your comment has been dropped to -1.

Hyperthreading may provide a performance boost for some multithreaded software that benefits from lots of cores (like 3D rendering), but it won't help for games.

Think about it: it's like having two cores, each running at half speed. OR, if one thread dominates and the other gets to run only when the main thread is waiting, it's like gaining an additional slow thread... which doesn't really help games much. That thread will either slow down the main thread (because both can't run at the same time) or it will be unreliable (because you can't count on it to run, the main thread might not have a cache miss for a while).

If your sound is on the slow thread, and the main thread doesn't have a cache miss for a while, your sound is going to skip. OR, your sound won't skip, but it won't skip because the core is running the sound thread instead of running the game thread when it could be running the game thread. So your game will be slower.

Further, if your core is running more often, it will generate more heat (while it is waiting on a cache miss, if you have HT off, it's not generating as much heat). Which means you won't be able to clock it as high.

At best, HT will boost performance if you have a dual core processor running a 4-core game (like Alan Wake), but it will still be utter crap compared to using a real quad core processor.

Hyperthreading was useless in the Pentium 4 and even if it's a little better in C2D, it's still going to be useless for most applications--at least the applications we care about.

It's going to remain a specialty/niche feature that most of us won't benefit from. Anything else is ridiculously overoptimistic (at least, until we can see some benchmarks! I'm willing to change my mind). And the best way to guess at whether HT will improve any given application is to look at how the application is designed to run on multiple cores. Which is exactly what the parent comment here says... which is why downrating it to -1 is stupid.


By saratoga on 1/31/2007 3:59:10 PM , Rating: 2
quote:
Hyperthreading may provide a performance boost for some multithreaded software that benefits from lots of cores (like 3D rendering), but it won't help for games.


Old games maybe not. Newer games will benefit, since they'll have dual core support.

quote:
Think about it: it's like having two cores, each running at half speed.


Err, no its not. If that were true, then you would never have a speed up or slow down. Its more like having one core that can run two threads concurrently with execution resources divided between them. Actually, thats exactly what it is.

quote:
OR, if one thread dominates and the other gets to run only when the main thread is waiting, it's like gaining an additional slow thread... which doesn't really help games much.


Doesn't work like that. The OS schedules the threads (and the software can too), so there is no "main" thread. Both are basically equal, unless your OS is really dumb.

quote:
That thread will either slow down the main thread (because both can't run at the same time) or it will be unreliable (because you can't count on it to run, the main thread might not have a cache miss for a while).


This is actually what happens WITHOUT HT. With it, both threads CAN run at the same time.

Also, remember, HT does not depend on cache misses. Its very easy to have a thread that never misses a cache read, but still benefits strongly from HT.

quote:
If your sound is on the slow thread, and the main thread doesn't have a cache miss for a while, your sound is going to skip. OR, your sound won't skip, but it won't skip because the core is running the sound thread instead of running the game thread when it could be running the game thread. So your game will be slower.


Sorry, but no. This has nothing to do with HT, and its not how threading works. At least not in Windows, Linux or MacOS X. A few embedded systems and MacOS 9 worked a little bit like that through.

quote:
At best, HT will boost performance if you have a dual core processor running a 4-core game (like Alan Wake), but it will still be utter crap compared to using a real quad core processor.


Thats just dumb. Dual core and HT are complementary. You should be comparing like verses like. That is, a dual core HT processor against a quad core HT processor. Saying "dual core is better then HT" is meaningless, since the real comparison would be "dual core with HT verses quad core with HT".


By DallasTexas on 1/31/2007 4:28:37 PM , Rating: 3
Good argument but I would not look at the rating system here as an indicator of contribution value. The rating system in here is an indicator if the comment was liked or not. Sad but true.


By Araemo on 1/31/2007 9:38:25 AM , Rating: 2
"Unless you have a supporting O/S and properly written software HT is all marketing hype. As previously demonstrated, HT on Intel chips would slow most software performance until it was disabled."

Even windows XP(SP2?) supports non-symmetric multi-processing. (IE, Hyperthreading)

Any OS with SMP support can get a partial boost from HT, but it can also hamper performance. With the proper low-level chip info(You might have to install some intel driver files for this?), the OS can schedule tasks to efficiently use the real hardware execution resources that are 'left over' for HT, while a standard "SMP" OS treats each thread as having the full resources of one CPU available to it.


By bottle23 on 1/31/2007 5:39:41 PM , Rating: 2
quote:
by Beenthere on January 30, 2007 at 8:03 PM

Unless you have a supporting O/S and properly written software HT is all marketing hype. As previously demonstrated, HT on Intel chips would slow most software performance until it was disabled.


I wouldn't make a generalisation like that. Because HT does benefit under some scenarios. It depends on which scenarios you've demo'ed.


"I f***ing cannot play Halo 2 multiplayer. I cannot do it." -- Bungie Technical Lead Chris Butcher

Related Articles
Recent Intel Tidings, Retractions
January 31, 2007, 9:38 AM
Life With "Penryn"
January 27, 2007, 12:01 AM
Intel 45nm "Penryn" Tape-Out Runs Windows
January 10, 2007, 2:13 AM
AMD Announces "Brisbane" 65nm Processors
December 5, 2006, 1:27 AM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki