backtop


Print 30 comment(s) - last by saratoga.. on Nov 14 at 8:32 PM


A quick list of new SSE instructions - Image courtesy HotHardware
Fourth generation streaming SIMD extensions

With Intel’s Conroe Core 2 Duo launch in June 2006, Intel added several new SSE optimizations. The new SSE optimizations included with Intel’s Core 2 architecture sped up SSE, SSE2 and SSE3 operations two-fold. This was performed by optimizing the Core 2 architecture to execute a 128-bit SSE, SSE2 and SSE3 instruction in a single clock cycle. Intel’s previous Netburst and Core architecture required two clock cycles to execute the same instruction. These extensions and optimizations of SSE3 were not actually new instructions but more or less an improvement in efficiency.

Intel’s Pat Gelsinger announced today that Intel has published the white paper on its SSE4 instructions that will appear in its next-generation 45nm products. The new SSE4 instructions add 50 new performance enhancing instructions. These instructions optimize vector compiling, media, string and text processing and application targeted accelerators.

The Core architecture implemented on the Core 2 Duo processors added 32 additional supplimental streaming instructions to SSE3.  These instructions, dubbed Supplimental Streaming SIMD, are not SSE4 and should not be confused as such.

SSE4 instructions are expected to arrive incrementally in Intel’s first 45nm product that is expected to sample in the second half of 2007. This includes Intel’s upcoming Nehalem, which will be Intel’s second generation Core architecture, and Penryn, a 45nm shrink of Core 2 Duo. Intel Penryn and other 45nm processors are expected to begin sampling the second half of 2007 and begin shipping in the first half of 2008.  Full implementation of SSE4 is only planned for Nehalem at this time.

More details are available in Intel's whitepaper on the subject.



Comments     Threshold


This article is over a month old, voting and posting comments is disabled

RE: lol, CRC32 instruction
By Tyler 86 on 9/28/2006 4:46:08 AM , Rating: 2
...erm, esp being an assumption that the stack pointer is low (>=(1<<31)) due to the application being loaded, and crc32 accepting a 'near' address, on an amd64 system... shrug.

actually, it's still quite ridiculous, but hey, it's an example...


RE: lol, CRC32 instruction
By Visual on 9/29/2006 5:40:00 AM , Rating: 2
a dedicated instruction will save you almost nothing in my oppinion - the cpu still has to do pretty much the same operations, no matter if you code it with just one asm instruction that gets broken to dozens of uops.
also, i kinda doubt the instruction will be suitable for all possible scenarios... i.e. it may work if you need to verify certain data at once, but what if you want to do it incrementally, your data isn't linear, or something like that?
and specifically for videogame networking, error checking (and correction) is better done with more traditional algorythms - parity bits, etc. since the packets are quite small.

its good to see new and new instructions, but i think this is getting somewhat ridiculous. soon we'll have a separate asm instruction for every imaginable function from the c++ standard libraries :p

btw, i wonder... is it possible to compile programs to uops, or to program in uops directly?


"This is about the Internet.  Everything on the Internet is encrypted. This is not a BlackBerry-only issue. If they can't deal with the Internet, they should shut it off." -- RIM co-CEO Michael Lazaridis

Did You Partake in "Black Friday/Thursday"?
Did You Partake in "Black Friday/Thursday"? 





0 Comments
Related Articles
Intel Talks 45nm Production
September 26, 2006, 2:19 PM
Here Comes "Conroe"
July 13, 2006, 12:47 PM
Intel Life After "Conroe"
June 20, 2006, 12:38 PM













botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki