I found one other issue, and that is we can't really determine whether the FP unit is being used from userland or generate an exception-on-first-use visible to userland, which means we can't optimize-out the FP state save/restore. I don't think this is a showstopper, though. For now we can just unconditionally save/restore the FP state... it only takes 69ns or so on my test box for a save+restore sequence. Later on the kernel can easily supply the required information via shared memory or something similar (along with the signal mask and pending bits).
Either way, just doing it uncoditionally is still far faster then making a system call. -Matt Matthew Dillon <[EMAIL PROTECTED]>