Mersenne: Re: Mersenne Digest V1 #552 (IA-64)

Steinar H . Gunderson Thu, 6 May 1999 05:00:12 -0700
On Wed, May 05, 1999 at 05:54:50PM -0700, Mersenne Digest wrote:
>>From the IA-64 register set figure in the advert, one weakness appears to me
>to be the sheer amount of silicon: Intel is going from just 8 FP registers
>in all the Pentium incarnations to a whopping 128, each still having the
>x86's full 80 bits, in the IA-64 There are also 128 65-bit general purpose
>(64-bit integer plus carry bit) registers.

I've heard the main problem is getting efficient code out of it. Merced (IA-64)
is rumoured (or is it official?) to have 7 different execution streams. Writing
code to use all of these efficiently will be very difficult, if possible at all.

However, I think it's great that Intel finally gets some more registers than
the 8 (actually 7, you need at least a stack pointer) they've been using. 128
might just be a bit too much, though.

>2) What use are so many registers without a major increase in the number
>of functional units? (I believe IA-64 still has 1 FP adder, 1FP multiplier,
>perhaps 2 64-bit integer units, the latter having been a longstanding feature
>of e.g. Alpha and MIPS, which both have excellent integer functionality
>and neither has 128 integer registers.)

The point of registers is not only to be able to make operations on them
simultanously (hmmm, hope i got thawt right), but also being able to hold
things in them. If you've been doing x86 assembly, you will find yourself
having to push data in and out of memory all the time, because 8 registers
simply aren't enough for most algorithms. Hopefully, this will reduce the
amount of data going into and from memory (and cache).

I would love SIMD functions on this, however. Imagine splitting the registers
in 16 parts and doing `16-way' SIMD operations...

>Sure, lots of registers is nice for out-of-order execution (OOE), but even
>an OOE monster like the Alpha 21264 has "just" 80 FP registers, each of just
>64 bits - IA-64 will have double the silicon here!

Agreed. But nobody will accept going _down_ in FP precision.

---snip---

>> b1) Using "properly sized" FFT: Yes, I believe using properly sized FFT's
>> saves some time. How much depends on the programmer, the data, the
>> language/compiler, and the hardware. Non-power-of-2 FFT's can even be
>> useful, but it seems that coding up all (or most) of the possible sizes may
>> be more trouble than it's worth.
>Well, George went to the trouble of adding 3*2^n, 5*2^n and 7*2^n 
>run length code into Prime95 - even though it's less efficient than 
>plain powers-of-two, it's still well worth while, for the exponents that 
>can benefit from it.

I think we're mixing up things here. One is about non-power-of-two FFT
runlengths. The thing I believe he originally asked about was changing the
FFT runlength dynamically, as the number increased in size.

>If it _does_ start to repeat "early" then 2^p-1 
>_must_ be composite - because it can't get to zero without sticking 
>at 2, see above - but is it really worth adding the extra code to 
>check?

Probably not, as you have to keep everything in memory (or disk). At (say) a
256K FFT, you would need 1 MB just for four iterations, and and awful lot
of comparison.

/* Steinar */
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne: Re: Mersenne Digest V1 #552 (IA-64)

Reply via email to