In addition to ATLAS, I have found the machine crashes occasionally (not
nearly as predictably though) while compiling a kernel or running ACML-
GPU demos.

Jaunty latest kernel and CentOS 5.3 kernel both have the vulnerabilty,
while Fedora Core 11 does not.  I've done further testing, and the
problem appears to be a kernel bug related to the number of cores.   I
have tried building ATLAS after disabling cores via:

   echo 0 > /sys/devices/system/cpu/cpu${N}/online

Results:

One core enabled [0]:                     ATLAS builds (no crash) [3/3]
Two cores on same cpu [0/1]:         ATLAS builds (no crash) [3/3]
Two cores on diff cpu [0/4]:             ATLAS builds (no crash) [3/3]
Four cores on same cpu [0/1/2/3]:  Crash occured in 1 of 3 builds after ~5 
minutes
Four cores on diff cpu [0/1/4/5]:     ATLAS builds (no crash) [3/3]
Six cores on diff cpu [0/1/2/4/5/6]:  ATLAS builds (no crash) [3/3] but did 
freeze for 3-4 min. during one build
All 8 cores: [0-7]:                            Crash every build within 1-2 min.

So it appears to be some corner case that doesn't manifest itself often
or consistently until #CPUS's >= 8.  ATLAS aggressively tests the
capabilities of all cores, so it makes sense that it would make a good
canary in the coal mine for this sort of bug.

After installing FC11, I could leave ATLAS building in a loop on the box
all night.

-- 
Building ATLAS on Intel Xeon E5520 crashes machine
https://bugs.launchpad.net/bugs/400750
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to