To follow up: in testing this I ended up discovering a number of tedious, complicated deadlocks that could occur due to softints, kernel_lock and other factors. Trying to mitigate them killed the performance gain and it still wasn't right. I'm abandoning this idea because in practice it seems too complicated. On a more positive note there are a couple of LOCKDEBUG improvements out of it.
Andrew