Hi, I am n newbie in HPC world
I am trying to execute the hpcc benchmark on our cluster, but every time I start the job, I get this error, then the job exits *compute017.22840Exhausted 1048576 MQ irecv request descriptors, which usually indicates a user program error or insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)compute024.22840Exhausted 1048576 MQ irecv request descriptors, which usually indicates a user program error or insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)compute019.22847Exhausted 1048576 MQ irecv request descriptors, which usually indicates a user program error or insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)-------------------------------------------------------Primary job terminated normally, but 1 process returneda non-zero exit code.. Per user-direction, the job has been aborted.---------------------------------------------------------------------------------------------------------------------------------mpirun detected that one or more processes exited with non-zero status, thus causingthe job to be terminated. The first process to do so was: Process name: [[19601,1],272] Exit code: 255--------------------------------------------------------------------------* Platform : IBM PHPC OS : RHEL 6.5 one management node 32 compute node : 16 cores, 32GB RAM, intel qlogic QLE7340 one port QRD infiniband 40Gb/s I compiled hpcc against : IBM MPI, Openmpi 2.0.1 (compiled with gcc 4.4.7) and Openmpi 1.8.1 (compiled with gcc 4.4.7) I get the errors, but each time on different compute nodes. This is the command I used to start the job *mpirun -np 512 --mca mtl psm --hostfile hosts32 /shared/build/hpcc-1.5.0b-blas-ompi-181/hpcc hpccinf.txt* Any help will be appreciated, and if you need more details, let me know. Thanks in advance. Regards.
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users