Hi,

I am n newbie in HPC world

I am trying to execute the hpcc benchmark on our cluster, but every time I
start the job, I get this error, then the job exits














*compute017.22840Exhausted 1048576 MQ irecv request descriptors, which
usually indicates a user program error or insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)compute024.22840Exhausted 1048576 MQ irecv
request descriptors, which usually indicates a user program error or
insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)compute019.22847Exhausted 1048576 MQ irecv
request descriptors, which usually indicates a user program error or
insufficient request descriptors
(PSM_MQ_RECVREQS_MAX=1048576)-------------------------------------------------------Primary
job  terminated normally, but 1 process returneda non-zero exit code.. Per
user-direction, the job has been
aborted.---------------------------------------------------------------------------------------------------------------------------------mpirun
detected that one or more processes exited with non-zero status, thus
causingthe job to be terminated. The first process to do so was:  Process
name: [[19601,1],272]  Exit code:
255--------------------------------------------------------------------------*

Platform : IBM PHPC
OS : RHEL 6.5
one management node
32 compute node : 16 cores, 32GB RAM, intel qlogic QLE7340 one port QRD
infiniband 40Gb/s

I compiled hpcc against : IBM MPI, Openmpi 2.0.1 (compiled with gcc 4.4.7)
and Openmpi 1.8.1 (compiled with gcc 4.4.7)

I get the errors, but each time on different compute nodes.

This is the command I used to start the job

*mpirun -np 512 --mca mtl psm --hostfile hosts32
/shared/build/hpcc-1.5.0b-blas-ompi-181/hpcc hpccinf.txt*

Any help will be appreciated, and if you need more details, let me know.
Thanks in advance.


Regards.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to