Hi,

     A fortran application which is compiled with ifort-10.1 and open mpi
1.3.1 on Cent OS 5.2 fails after running 4 days with following error
message:

[compute-0-7:25430] *** Process received signal ***
[compute-0-7:25433] *** Process received signal ***
[compute-0-7:25433] Signal: Bus error (7)
[compute-0-7:25433] Signal code:  (2)
[compute-0-7:25433] Failing at address: 0x4217b8
[compute-0-7:25431] *** Process received signal ***
[compute-0-7:25431] Signal: Bus error (7)
[compute-0-7:25431] Signal code:  (2)
[compute-0-7:25431] Failing at address: 0x4217b8
[compute-0-7:25432] *** Process received signal ***
[compute-0-7:25432] Signal: Bus error (7)
[compute-0-7:25432] Signal code:  (2)
[compute-0-7:25432] Failing at address: 0x4217b8
[compute-0-7:25430] Signal: Bus error (7)
[compute-0-7:25430] Signal code:  (2)
[compute-0-7:25430] Failing at address: 0x4217b8
[compute-0-7:25431] *** Process received signal ***
[compute-0-7:25431] Signal: Segmentation fault (11)
[compute-0-7:25431] Signal code:  (128)
[compute-0-7:25431] Failing at address: (nil)
[compute-0-7:25430] *** Process received signal ***
[compute-0-7:25433] *** Process received signal ***
[compute-0-7:25433] Signal: Segmentation fault (11)
[compute-0-7:25433] Signal code:  (128)
[compute-0-7:25433] Failing at address: (nil)
[compute-0-7:25432] *** Process received signal ***
[compute-0-7:25432] Signal: Segmentation fault (11)
[compute-0-7:25432] Signal code:  (128)
[compute-0-7:25432] Failing at address: (nil)
[compute-0-7:25430] Signal: Segmentation fault (11)
[compute-0-7:25430] Signal code:  (128)
[compute-0-7:25430] Failing at address: (nil)
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 25433 on node
compute-0-7.local exited on signal 11 (Segmentation fault).


--------------------------------------------------------------------------

This job is run with 4 open mpi processes, on the nodes which have
interconnected with Gigabit.
The same job runs well on the nodes with infiniband connectivity.

What could be the reason for this? Is this due to loose physical
connectivities, as its giving a bus error?

Reply via email to