The SSE4 programming reference is out - a opportunity to study the improvements.
Besides 47 (plus 7 additional for the nehalem microarchitecture) new instructions which mainly focus on multimedia acceleration. MPSADBW (Sum of Absolute Differences), PHMINPOSUW Minimum Search (find minimum uint16_t from eight elements) (if you invite the source you had an fast max() ;-), ROUND (round floating point types) and other instructions too.
Dot product matrix calculation, load hint instruction (MOVNTDQA) to store aligned data in a small data-set, packed integer format conversions (convert in wider data types), IEEE 754 Compliance operations....