backtop


Print 30 comment(s) - last by saratoga.. on Nov 14 at 8:32 PM


A quick list of new SSE instructions - Image courtesy HotHardware
Fourth generation streaming SIMD extensions

With Intel’s Conroe Core 2 Duo launch in June 2006, Intel added several new SSE optimizations. The new SSE optimizations included with Intel’s Core 2 architecture sped up SSE, SSE2 and SSE3 operations two-fold. This was performed by optimizing the Core 2 architecture to execute a 128-bit SSE, SSE2 and SSE3 instruction in a single clock cycle. Intel’s previous Netburst and Core architecture required two clock cycles to execute the same instruction. These extensions and optimizations of SSE3 were not actually new instructions but more or less an improvement in efficiency.

Intel’s Pat Gelsinger announced today that Intel has published the white paper on its SSE4 instructions that will appear in its next-generation 45nm products. The new SSE4 instructions add 50 new performance enhancing instructions. These instructions optimize vector compiling, media, string and text processing and application targeted accelerators.

The Core architecture implemented on the Core 2 Duo processors added 32 additional supplimental streaming instructions to SSE3.  These instructions, dubbed Supplimental Streaming SIMD, are not SSE4 and should not be confused as such.

SSE4 instructions are expected to arrive incrementally in Intel’s first 45nm product that is expected to sample in the second half of 2007. This includes Intel’s upcoming Nehalem, which will be Intel’s second generation Core architecture, and Penryn, a 45nm shrink of Core 2 Duo. Intel Penryn and other 45nm processors are expected to begin sampling the second half of 2007 and begin shipping in the first half of 2008.  Full implementation of SSE4 is only planned for Nehalem at this time.

More details are available in Intel's whitepaper on the subject.



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Core?
By Alphafox78 on 9/27/2006 4:50:15 PM , Rating: 2
I thought that SSE4 was listed as being in curent Core 2 cpus?




RE: Core?
By Alphafox78 on 9/27/2006 4:50:38 PM , Rating: 2
oops


RE: Core?
By FITCamaro on 9/27/2006 5:19:50 PM , Rating: 2
Actually I did too. I guess I got confused and thought they had done more than just optimize the previous instructions.


RE: Core?
By Grated on 9/27/2006 5:22:51 PM , Rating: 2
Mayby it is...

If the report is correct and SSE4 wil come @ the dieshrink of conroe, then they could enable them (like they enabled HT on P4's)...


RE: Core?
By hstewarth on 9/27/2006 6:09:03 PM , Rating: 2
Or maybe its more simple than that - a simple software switch. Once software is available, it can be just turn on. I thought Core 2 had these instruction set.


RE: Core?
By Kougar on 9/27/2006 7:03:56 PM , Rating: 2
Core 2 Duo has the SSE4 instruction set. Which is probably why Penryn, the die shrink of Conroe, is reported by Cnet to have them.


RE: Core?
By KristopherKubicki (blog) on 9/27/2006 7:18:17 PM , Rating: 3
Core 2 Duo has 32 new instructions from SSE3, dubbed Sipplimental Streaming SIMD. These are *not* the instruction set dubbed SSE4. SSE4 are 50 new instructions and will show up in 2008 at the earliest.

CNET reported this as Penryn because that is when the first 45nm node chips. Nehalem is the next generation architecture after Penryn, and will also use the SSE4 instructions


RE: Core?
By Kougar on 9/28/2006 3:58:39 AM , Rating: 2
Okay, so basically a SSE3.1, or something to that effect? That makes better sense to me, thanks.

SSE4 was being touted around during the C2D launch, infact some stores list all C2D's as having SSE4 on them. Which is sort of ironic because I can't find any documentation now when I was looking for it.


RE: Core?
By FITCamaro on 9/28/2006 12:53:12 PM , Rating: 2
Not to mention CPU-Z reports my E6600 as having SSE4.


The Wisdom of new instruction set
By knitecrow on 9/27/06, Rating: 0
RE: The Wisdom of new instruction set
By hstewarth on 9/27/06, Rating: 0
By saratoga on 11/14/2006 8:32:27 PM , Rating: 2
quote:
Because Intel has the right to enhanced there own instruction set which they designed. AMD did not designed the x86 processor that we have today and Intel does not need confirm its designed with AMD.


Actually, AMD (as well as MS to a lesser extent) is largely behind the recent move to SSE2 because they force you to use it in their x86-64 extension. Although Intel came up with (most) of SSE2, the final implementation was actually from AMD (which tweaked it with new registers), with Intel quietly adopting it when MS forced then to support x86-64


RE: The Wisdom of new instruction set
By DallasTexas on 9/27/06, Rating: -1
RE: The Wisdom of new instruction set
By ergle on 9/27/2006 7:43:31 PM , Rating: 2
I was with you 'til the Intel fanboi-ism kicked in.

AMD generally adopts new instruction sets quickly and 3DNow had widespread support at the time, especially in games and drivers.

3DNow includes a subset of the original SSE.


RE: The Wisdom of new instruction set
By tygrus on 9/27/2006 11:30:31 PM , Rating: 2
quote:
3DNow includes a subset of the original SSE.


Actually, I think it was the other way around. AMD introduced 3DNow before Intel had SSE. Some of 3DNow instructions were same as Intel SSE but the context/mode switch was different. AMD further added to 3DNow before continuing with SSE# after license fight and for simplicity.
The AMD K6-2 & K6-3 chips were good for most but had a limited future due to limited scailing, slow x87 FPU, and newer PIII, P4 and K7. 3DNow overcame some of the slow x87 FPU problems and made it competative against PII and early PIII.


RE: The Wisdom of new instruction set
By ergle on 9/28/2006 2:04:03 AM , Rating: 2
Yeah, 3DNow was first, but it still only contains a subset of SSE.

My comments weren't meant to confer that the K6 was wonderful. As someone who's worked with both, I'm well aware of the K6's shortcomings -- especially the lack of a pipelined FPU.

I do find it rather odd I was troll-rated for calling someone on their trolling tho'. Not your doing, I know :)


By kobymu on 9/28/2006 6:53:30 AM , Rating: 3
quote:
yeah, 3DNow was first, but it still only contains a subset of SSE.


MMX was first , and was extremely successful, which is the only reason why AMD created 3DNew in the first place.

quote:
Why not work with others in the industry (like AMD) and release instructions that software developers are asking for and will pledge support for.


"pledge support" ??? ...anyway, MMX was a huge success because Intel did just that, they listen to programmers and clients how wanted multimedia enhancements.

And not only was MMX a huge success, it was a fast success; programmers adopted MMX in almost unprecedented speed, unlike 486 new instructions or Pentium new instructions or even Pentium pro newer instructions, which most programmers took literally years to adopt.

IIRC Photoshop was the first 'big' adopter; only a few months after MMX initial release (don't remember the exact version though).


By smitty3268 on 9/27/2006 11:18:50 PM , Rating: 2
quote:
As far as AMD is concerned, yes, they adopt Intel extentions about 10 years later. They also come out with some of their own like 3DNow which found it's way into one Disney applications - I think it was Goofy.


More like 1 year later. Anyway, 3Dnow (and its successors) were actually quite popular with games and other apps early on. They were pretty much eclipsed by SSE/2 though.


Isn't nehalem just a shrink?
By Phynaz on 9/27/2006 4:50:45 PM , Rating: 2
quote:
This includes Intel’s upcoming Nehalem


Cnet is reporting the instruction will be included with Penryn, not Nehalem.





RE: Isn't nehalem just a shrink?
By KristopherKubicki (blog) on 9/27/2006 5:02:18 PM , Rating: 2
Penryn is the 45nm die shrink of Core 2 Duo. Nehalem is the actual "next generation" architecture.


RE: Isn't nehalem just a shrink?
By Phynaz on 9/27/2006 5:03:21 PM , Rating: 2
Thanks, looks like Cnet got it wrong.


RE: Isn't nehalem just a shrink?
By Phynaz on 9/27/2006 5:04:31 PM , Rating: 2
Unless, Intel is going to add instructions with the die shrink?

Nah, they aren't that stupid.


By KristopherKubicki (blog) on 9/27/2006 7:59:29 PM , Rating: 2
Intel's guidance documentation states:

"Beginning with the 45nm Intel microarchitecture based processors slated for production (codenamed Penryn) in 2007, Intel will start to implement SSE4 instructions"

Followed by a footnote saying:

"Most of these instructions will be available in Penryn and some of the instructions will be in microprocessors slated for release after Penryn"


lol, CRC32 instruction
By Tyler 86 on 9/28/2006 4:42:40 AM , Rating: 2
About time...

This particular instruction might be reverse-implementable on current applications that use common CRC32 implementations..

Example...
{
mov rax, esp
mov rcx, rsp+100h
call crc32
...
ret
crc32:
... more than enough instructions, I'm sure...
ret
}

all to
{
crc32 esp, rsp+100h
}

Why, a nice little shell script could do it...
Might help substantially in video-game network, where quick verification of packet integrity is of great importance...




RE: lol, CRC32 instruction
By Tyler 86 on 9/28/2006 4:46:08 AM , Rating: 2
...erm, esp being an assumption that the stack pointer is low (>=(1<<31)) due to the application being loaded, and crc32 accepting a 'near' address, on an amd64 system... shrug.

actually, it's still quite ridiculous, but hey, it's an example...


RE: lol, CRC32 instruction
By Visual on 9/29/2006 5:40:00 AM , Rating: 2
a dedicated instruction will save you almost nothing in my oppinion - the cpu still has to do pretty much the same operations, no matter if you code it with just one asm instruction that gets broken to dozens of uops.
also, i kinda doubt the instruction will be suitable for all possible scenarios... i.e. it may work if you need to verify certain data at once, but what if you want to do it incrementally, your data isn't linear, or something like that?
and specifically for videogame networking, error checking (and correction) is better done with more traditional algorythms - parity bits, etc. since the packets are quite small.

its good to see new and new instructions, but i think this is getting somewhat ridiculous. soon we'll have a separate asm instruction for every imaginable function from the c++ standard libraries :p

btw, i wonder... is it possible to compile programs to uops, or to program in uops directly?


"Fourth" or Forth
By Kiwi on 9/27/2006 5:29:29 PM , Rating: 2
One is a number. The other is a programming language. This spelling error is one of those that fools your eye into seeing a "U" when it was left out, because you EXPECT to see it.




Detail info on SSE4
By hstewarth on 9/28/2006 9:26:41 PM , Rating: 2
I search Intel site and found the following link (PDF)

http://cache-www.intel.com/cd/00/00/32/26/322663_3...


This link appears to be in the cache by name in link, but provides a lot of information. It does appear that most of instructions are coming in 2007. But its also states that 30 of operations are coming in 2006, 50 operations are coming in 2007.

It does appear that Core 2 as some but not all of instructions. I am also curious the Xeon 51xx and 53xx series could have more than desktop chips. Intel has done such things in past. But nothing shows this unless I missed something.




Core?
By Alphafox78 on 9/27/06, Rating: -1
"My sex life is pretty good" -- Steve Jobs' random musings during the 2010 D8 conference

Related Articles
Intel Talks 45nm Production
September 26, 2006, 2:19 PM
Here Comes "Conroe"
July 13, 2006, 12:47 PM
Intel Life After "Conroe"
June 20, 2006, 12:38 PM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki