Following up on this, I have partial resolution. The primary culprit appears to be stale files in a ramdisk non-uniformly distributed across the sockets, thus interactingly poorly with NUMA. The slow runs invariably have high numa_miss and numa_foreign counts. I still have trouble making it explain up to a factor of 10 degredation, but it certainly explains a factor of 3.
Jed