Also, the error message suggested that TCP is not the issue here -- the TCP 
hangups are likely because some other process exited unexpectedly.

Indeed:

-----
mpirun noticed that process rank 0 with PID 4989 on node compute-0-1 exited on 
signal 4 (Illegal instruction).
-----

This might be the real issue.  Getting a corefile, as was already suggested, 
might be the best way to go forward.



> On Sep 2, 2016, at 5:50 AM, John Hearns via users <users@lists.open-mpi.org> 
> wrote:
> 
> Mahmood, as Giles says start by looking at how that application is compiled 
> and linked.
> Run 'ldd' on the executable and look closely at the libraries.  Do this on a 
> compute node if you can.
> 
> There was a discussion on another mailign list recently about how to 
> fingerpritn executables and see which architecture it was compiled for.
> My mind is a blank at the moment as to what that discussion concluded. Sorry. 
>  And if this was on OpenMPI I am doubly sorry!
> 
> 
> On 2 September 2016 at 10:37, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> Did you ran
> ulimit -c unlimited
> before invoking mpirun ?
> 
> if your application can be ran with only one tasks, you can try to run it 
> under gdb.
> you will hopefully be able to see where the illegal instruction occurs.
> 
> since you are running on AMD processors, you have to make sure you are not 
> using any third party library that was optimized for Intel processors (e.g. 
> that uses AVX (SSE ?) instructions)
> 
> Cheers,
> 
> Gilles
> 
> On Friday, September 2, 2016, Mahmood Naderan <mahmood...@gmail.com> wrote:
> >Are you running under a batch manager ?
> >On which architecture ?
> Currently I am not using the job manager (which is actually PBS). I am
> running from the terminal.
> 
> The machines are AMD Opteron 64 bit
> 
> 
> >Hopefully you will get a core file that points you to the illegal instruction
> Where is that core file. I can not find it.
> 
> BTW, the openmpi is 1.6.5
> 
> 
> --
> Regards,
> Mahmood
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to