Yikes - that's not a good error.  :-(

We don't regularly build / test on AIX, so I don't have much immediate guidance for you. My best suggestion at this point would be to try the latest 1.2 beta or nightly snapshot. We did an update of the event engine (the portion of the code that you're seeing the error issue from) that *may* alleviate the problem...? (I have no idea, actually -- I'm just kinda hoping that the new version of the event engine will fix your problem :-\ )


On Dec 27, 2006, at 10:29 AM, Michael Marti wrote:

Dear All

I am trying to get openmpi-1.1.2 to work on AIX 5.3 / power5.

:: Compilation seems to have worked with the following sequence:
====================================================================
setenv OBJECT_MODE 64

setenv CC xlc
setenv CXX xlC
setenv F77 xlf
setenv FC xlf90

setenv CFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x -qtune=pwr5 - q64" setenv CXXFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x - qtune=pwr5 -q64" setenv FFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x -qtune=pwr5 - q64" setenv FCFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x -qtune=pwr5 -q64"
setenv LDFLAGS "-Wl,-brtl"

./configure --prefix=/ist/openmpi-1.1.2 \
  --disable-mpi-cxx \
  --disable-mpi-cxx-seek \
  --enable-mpi-threads \
  --enable-progress-threads \
  --enable-static \
  --disable-shared \
  --disable-io-romio
====================================================================

:: After the compilation I ran make check and all 11 tests passed successfully.

:: Now I'm trying to run the following command just for test:
# mpirun -hostfile /gpfs/MICHAEL/MPI_hostfiles/mpinodes_b41- b44_1.asc -np 2 /usr/bin/hostname - The file /gpfs/MICHAEL/MPI_hostfiles/mpinodes_b41-b44_1.asc contains 4 hosts:
    r1blade041 slots=1
    r1blade042 slots=1
    r1blade043 slots=1
    r1blade044 slots=1
- The mpirun command eventually hangs with the following message:
    [r1blade041:418014] poll failed with errno=25
[r1blade041:418014] opal_event_loop: ompi_evesel->dispatch() failed. - In this state mpirun cannot be killed by hitting <ctrl-c> only a kill -9 will do the trick. - While the mpirun still hangs I can see that the "orted" has been launched on both requested hosts.

:: I turned on all debug options in openmpi-mca-params.conf. The output for the same call of mpirun is in the file mpirun-debug.txt.gz.
<mpirun-debug.txt.gz>

:: As sugested in the mailinglis rules I include config.log (config.log.gz) and the output of ompi_info (ompi_info.txt.gz).
<config.log.gz>

<ompi_info.txt.gz>


:: As I am completely new to openmpi (I have some experience with lam) I am lost at this stage. I would really appreciate if someone could give me some hints as to what is going wrong and where I could get more info.

Best regards,

Michael Marti.


--
---------------------------------------------------------------------- ------
Michael Marti
Centro de Fisica dos Plasmas
Instituto Superior Tecnico
Av. Rovisco Pais
1049-001 Lisboa
Portugal

Tel:       +351 218 419 379
Fax:      +351 218 464 455
Mobile:  +351 968 434 327
---------------------------------------------------------------------- ------


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to