Hi Wodel Randomaccess part of HPCC is probably causing this.
Perhaps set PSM env. variable - Export PSM_MQ_REVCREQ_MAX=10000000 or something like that. Alternatively launch the job using mpirun --mca plm ob1 --host .... to avoid use of psm. Performance will probably suffer with this option however. Howard wodel youchi <wodel.you...@gmail.com> schrieb am Di. 31. Jan. 2017 um 08:27: > Hi, > > I am n newbie in HPC world > > I am trying to execute the hpcc benchmark on our cluster, but every time I > start the job, I get this error, then the job exits > > > > > > > > > > > > > > > *compute017.22840Exhausted 1048576 MQ irecv request descriptors, which > usually indicates a user program error or insufficient request descriptors > (PSM_MQ_RECVREQS_MAX=1048576)compute024.22840Exhausted 1048576 MQ irecv > request descriptors, which usually indicates a user program error or > insufficient request descriptors > (PSM_MQ_RECVREQS_MAX=1048576)compute019.22847Exhausted 1048576 MQ irecv > request descriptors, which usually indicates a user program error or > insufficient request descriptors > (PSM_MQ_RECVREQS_MAX=1048576)-------------------------------------------------------Primary > job terminated normally, but 1 process returneda non-zero exit code.. Per > user-direction, the job has been > aborted.---------------------------------------------------------------------------------------------------------------------------------mpirun > detected that one or more processes exited with non-zero status, thus > causingthe job to be terminated. The first process to do so was: Process > name: [[19601,1],272] Exit code: > 255--------------------------------------------------------------------------* > > Platform : IBM PHPC > OS : RHEL 6.5 > one management node > 32 compute node : 16 cores, 32GB RAM, intel qlogic QLE7340 one port QRD > infiniband 40Gb/s > > I compiled hpcc against : IBM MPI, Openmpi 2.0.1 (compiled with gcc 4.4.7) > and Openmpi 1.8.1 (compiled with gcc 4.4.7) > > I get the errors, but each time on different compute nodes. > > This is the command I used to start the job > > *mpirun -np 512 --mca mtl psm --hostfile hosts32 > /shared/build/hpcc-1.5.0b-blas-ompi-181/hpcc hpccinf.txt* > > Any help will be appreciated, and if you need more details, let me know. > Thanks in advance. > > > Regards. > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users