It looks like your executable is explicitly calling MPI_ABORT in the CmiAbort function -- perhaps in response to something happening in the namd or CmiHandleMessage functions. The next logical step would likely be to look in those routines and see why MPI_ABORT/CmiAbort would be invoked.

On Nov 11, 2009, at 4:49 AM, Yogesh Aher wrote:

Yes.. The executables run initially and then gives the mentioned error in the first message!
i.e.

./mpirun -hostfile machines executable
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 15617 on
node sibar.pch.univie.ac.at exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[2] Stack Traceback:
  [0] CmiAbort+0x25  [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22  [0x8367c20]
  [3] CsdScheduleForever+0x67  [0x8367dd2]
  [4] CsdScheduler+0x12  [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21  [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53  [0x80fa0f5]
  [7] main+0x2e  [0x80f65b6]
  [8] __libc_start_main+0xd3  [0x31cde3]
  [9] __gxx_personality_v0+0x101  [0x80f3405]
[3] Stack Traceback:
  [0] CmiAbort+0x25  [0x8366f3e]
  [1] namd [0x830d4cd]
  [2] CmiHandleMessage+0x22  [0x8367c20]
  [3] CsdScheduleForever+0x67  [0x8367dd2]
  [4] CsdScheduler+0x12  [0x8367d4c]
  [5] _Z10slave_initiPPc+0x21  [0x80fa09d]
  [6] _ZN7BackEnd4initEiPPc+0x53  [0x80fa0f5]
  [7] main+0x2e  [0x80f65b6]
  [8] __libc_start_main+0xd3  [0x137de3]
  [9] __gxx_personality_v0+0x101  [0x80f3405]
Running on MPI version: 2.1 multi-thread support: MPI_THREAD_SINGLE (max supported: MPI_THREAD_SINGLE)
cpu topology info is being gathered.
2 unique compute nodes detected.

------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: Internal Error: Unknown-msg-type. Contact Developers.

------------- Processor 3 Exiting: Called CmiAbort ------------
Reason: Internal Error: Unknown-msg-type. Contact Developers.

[studpc01.xxx.xxx.xx:15615] 1 more process has sent help message help-mpi-api.txt / mpi-abort [studpc01.xxx.xxx.xx:15615] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c: 124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104) [studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c: 124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104)

Yes, I put 64-bit executable on 1 machine (studpc21) & 32-bit executable on another machine (studpc01) with same name! But, I don't know whether they are being used separately or not. How can I check it? Can we use this option " ./mpirun -hetero" for specifying the machines? The jobs run individually on each machine, but if used together, it doesn't!

Hope it will give some hint coming at the solution..


Message: 2
Date: Tue, 10 Nov 2009 07:56:47 -0500
From: Jeff Squyres <jsquy...@cisco.com>
Subject: Re: [OMPI users] Openmpi on Heterogeneous environment
To: "Open MPI Users" <us...@open-mpi.org>
Message-ID: <8f008aab-358b-4e6a-83a0-9ece60fd5...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Do you see any output from your executables?  I.e., are you sure that
it's running the "correct" executables?  If so, do you know how far
it's getting in its run before aborting?


On Nov 10, 2009, at 7:36 AM, Yogesh Aher wrote:

> Thanks for the reply Pallab. Firewall is not an issue as I can
> passwordless-SSH to/from both machines.
> My problem is to deal with 32bit & 64bit architectures
> simultaneously (and not with different operating systems). Can it be
> possible through open-MPI???
>
> Look forward to the solution!
>
> Thanks,
> Yogesh
>
>
> From: Pallab Datta (datta_at_[hidden])
>
> I have had issues for running in cross platforms..ie. Mac OSX and
> Linux
> (Ubuntu)..haven't got it resolved..check firewalls if thats blocking
> any
> communication..
>
> On Thu, Nov 5, 2009 at 7:47 PM, Yogesh Aher <aher.yog...@gmail.com>
> wrote:
> Dear Open-mpi users,
>
> I have installed openmpi on 2 different machines with different
> architectures (INTEL and x86_64) separately (command: ./configure --
> enable-heterogeneous). Compiled executables of the same code for
> these 2 arch. Kept these executables on individual machines.
> Prepared a hostfile containing the names of those 2 machines.
> Now, when I want to execute the code (giving command - ./mpirun -
> hostfile machines executable), it doesn't work, giving error message:
>
> MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> -------------------------------------------------------------------------- > --------------------------------------------------------------------------
> mpirun has exited due to process rank 2 with PID 1712 on
> node studpc1.xxx.xxxx.xx exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here)
>
> When I keep only one machine-name in the hostfile, then the
> execution works perfect.
>
> Will anybody please guide me to run the program on heterogeneous
> environment using mpirun!
>
> Thanking you,
>
> Sincerely,
> Yogesh
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com

Reply via email to