Yes.. The executables run initially and then gives the mentioned error in the first message! i.e.
./mpirun -hostfile machines executable -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 2 with PID 15617 on node sibar.pch.univie.ac.at exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [2] Stack Traceback: [0] CmiAbort+0x25 [0x8366f3e] [1] namd [0x830d4cd] [2] CmiHandleMessage+0x22 [0x8367c20] [3] CsdScheduleForever+0x67 [0x8367dd2] [4] CsdScheduler+0x12 [0x8367d4c] [5] _Z10slave_initiPPc+0x21 [0x80fa09d] [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5] [7] main+0x2e [0x80f65b6] [8] __libc_start_main+0xd3 [0x31cde3] [9] __gxx_personality_v0+0x101 [0x80f3405] [3] Stack Traceback: [0] CmiAbort+0x25 [0x8366f3e] [1] namd [0x830d4cd] [2] CmiHandleMessage+0x22 [0x8367c20] [3] CsdScheduleForever+0x67 [0x8367dd2] [4] CsdScheduler+0x12 [0x8367d4c] [5] _Z10slave_initiPPc+0x21 [0x80fa09d] [6] _ZN7BackEnd4initEiPPc+0x53 [0x80fa0f5] [7] main+0x2e [0x80f65b6] [8] __libc_start_main+0xd3 [0x137de3] [9] __gxx_personality_v0+0x101 [0x80f3405] Running on MPI version: 2.1 multi-thread support: MPI_THREAD_SINGLE (max supported: MPI_THREAD_SINGLE) cpu topology info is being gathered. 2 unique compute nodes detected. ------------- Processor 2 Exiting: Called CmiAbort ------------ Reason: Internal Error: Unknown-msg-type. Contact Developers. ------------- Processor 3 Exiting: Called CmiAbort ------------ Reason: Internal Error: Unknown-msg-type. Contact Developers. [studpc01.xxx.xxx.xx:15615] 1 more process has sent help message help-mpi-api.txt / mpi-abort [studpc01.xxx.xxx.xx:15615] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104) [studpc21.xx.xx.xx][[6986,1],0][btl_tcp_frag.c:124:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev failed: Connection reset by peer (104) Yes, I put 64-bit executable on 1 machine (studpc21) & 32-bit executable on another machine (studpc01) with same name! But, I don't know whether they are being used separately or not. How can I check it? Can we use this option " ./mpirun -hetero" for specifying the machines? The jobs run individually on each machine, but if used together, it doesn't! Hope it will give some hint coming at the solution.. > Message: 2 > Date: Tue, 10 Nov 2009 07:56:47 -0500 > From: Jeff Squyres <jsquy...@cisco.com> > Subject: Re: [OMPI users] Openmpi on Heterogeneous environment > To: "Open MPI Users" <us...@open-mpi.org> > Message-ID: <8f008aab-358b-4e6a-83a0-9ece60fd5...@cisco.com> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > Do you see any output from your executables? I.e., are you sure that > it's running the "correct" executables? If so, do you know how far > it's getting in its run before aborting? > > > On Nov 10, 2009, at 7:36 AM, Yogesh Aher wrote: > > > Thanks for the reply Pallab. Firewall is not an issue as I can > > passwordless-SSH to/from both machines. > > My problem is to deal with 32bit & 64bit architectures > > simultaneously (and not with different operating systems). Can it be > > possible through open-MPI??? > > > > Look forward to the solution! > > > > Thanks, > > Yogesh > > > > > > From: Pallab Datta (datta_at_[hidden]) > > > > I have had issues for running in cross platforms..ie. Mac OSX and > > Linux > > (Ubuntu)..haven't got it resolved..check firewalls if thats blocking > > any > > communication.. > > > > On Thu, Nov 5, 2009 at 7:47 PM, Yogesh Aher <aher.yog...@gmail.com> > > wrote: > > Dear Open-mpi users, > > > > I have installed openmpi on 2 different machines with different > > architectures (INTEL and x86_64) separately (command: ./configure -- > > enable-heterogeneous). Compiled executables of the same code for > > these 2 arch. Kept these executables on individual machines. > > Prepared a hostfile containing the names of those 2 machines. > > Now, when I want to execute the code (giving command - ./mpirun - > > hostfile machines executable), it doesn't work, giving error message: > > > > MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD > > with errorcode 1. > > > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > > You may or may not see output from other processes, depending on > > exactly when Open MPI kills them. > > > -------------------------------------------------------------------------- > > > -------------------------------------------------------------------------- > > mpirun has exited due to process rank 2 with PID 1712 on > > node studpc1.xxx.xxxx.xx exiting without calling "finalize". This may > > have caused other processes in the application to be > > terminated by signals sent by mpirun (as reported here) > > > > When I keep only one machine-name in the hostfile, then the > > execution works perfect. > > > > Will anybody please guide me to run the program on heterogeneous > > environment using mpirun! > > > > Thanking you, > > > > Sincerely, > > Yogesh >