The error message seems to imply that mpirun itself didn't segfault, but that 
something else did. Is that segfault pid from mpirun?

This kind of problem usually is caused by mismatched builds - i.e., you compile 
against your new build, but you pick up the Myrinet build when you try to run 
because of path and ld_library_path issues. You might check to ensure you are 
running against what you built with.


On Oct 20, 2010, at 6:41 PM, Raymond Muno wrote:

> We are doing a test build of a new cluster. We are re-using our Myrinet 10G 
> gear from a previous cluster.
> 
> I have built OpenMPI 1.4.2  with PGI 10.4.   We use this regularly on our 
> Infiniband based cluster and all the install elements were readily available.
> 
> With a few go-arounds with the Myrinet MX stack, we are now running MX 
> -1.2.12 with allowances for more than the max of 16 endpoints. Each node has 
> 24 cores.
> 
> The cluster is running rocks 5.3.
> 
> As part of the initial build, I installed the Myrinet_MX Rocks Roll from 
> Myricom. With the default limitation of 16 endpoints, we could not run on all 
> nodes. As mentioned above, the MX stack was replaced.
> 
> Myrinet provided a build of OpenMPI 1.4.1.    That build works. It is only 
> compiled with gcc and gfortran and we wanted it built with the compilers we 
> normally use, e.g. PGI, Pathscale and Intel.
> 
> We can compile with the  OpenMPI 1.4.2 / PGI 10.4 build. However, we cannot 
> launch jobs with mpirun. It seg faults.
> 
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> [enet1-head2-eth1:29532] *** Process received signal ***
> [enet1-head2-eth1:29532] Signal: Segmentation fault (11)
> [enet1-head2-eth1:29532] Signal code: Address not mapped (1)
> [enet1-head2-eth1:29532] Failing at address: 0x6c
> [enet1-head2-eth1:29532] *** End of error message ***
> Segmentation fault
> 
> However, if we launch the job with the Myricom supplied mpirun in the OpenMPI 
> tree, the job runs successfully. This works even with a test program compiled 
> with the OpenMPI 1.4.2  with PGI 10.4 build.
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to