We fixed the openib segv, but I forgot to followup about the timeouts
that I mentioned in my original mail.
The timeouts were from poorly-configured spawn tests. That is, I had
8 cores in the job and ran the spawn test on all 8 cores (all
aggressively polling). The spawn test then spawned
Pasha and I *think* we have a fix. However, we're not quite clear on
this part of the code, so we need some more testing and eyes on the
code.
I'll start the tests now -- given that this is a low-frequency bug,
I'm going to run a slightly larger MTT run (several thousand tests)
that'll t
Unfortunately, I have to throw the flag in the v1.3 release. :-(
I ran ~16k tests via MTT yesterday on the rc5 and rc6 tarballs. I
found the following:
Found test runs: 15962
Passed: 15785 (98.89%)
Failed: 83 (0.52%)
--> Openib failures: 80 (0.50%)
Skipped: 46 (0.29%)
Timedout: 48 (0.30%)