Am 14.11.2014 um 03:32 schrieb Doan Trung Tung: > Hi Reuti, > > The mpi-ring is the test program from Intel which sends msg in a ring. If I > run the program manually using mpirun, it works just fine. The problem is > only when I use OSG to submit jobs with more than 16 slots (each node consist > 16 processors).
There is no limit in SGE for the number of slots requested by an application. Without knowing more details it's hard to make any assumption as I have no access to the application in question. As it's running outside of SGE: - do you request any resources during the submission? - can you print the $PE_HOSTFILE, as the application will get more than one node by SGE, as only 16 are available per machine as you stated - you use the PE "orte" but compiled with Intel MPI? Which MPI are you going to use? - do you request also two nodes when running interactively? - the file xxx.core will be produced by the kernel to debug the problem in the application when it segfaults To change this behavior for the interactive access: $ ulimit -Ha core file size (blocks, -c) unlimited ... $ ulimit -Sa core file size (blocks, -c) 0 $ ulimit -Sc unlimited and you get the file too. Inside SGE: these are the settings to switch it off: $ qconf -sq all.q ... s_core INFINITY h_core 0 -- Reuti > Tung > > On Fri, Nov 14, 2014 at 12:44 AM, Reuti <[email protected]> wrote: > Hi, > > Am 13.11.2014 um 18:09 schrieb Doan Trung Tung: > > > I have OSG installed as a role on Rock cluster installation for a cluster > > of 16 nodes, each node has 16 processors. I'm new with OSG so I let > > everything in default. > > When I submit mpi-ring example using qsub, if the number of slots is less > > than or equal to 16, all threads are run on a single random node. So I > > increase the number of slots to a number that larger than 16 hoping that > > they will run on different nodes, but actually they get errors. > > > > Here is the script I used to submit mpi-ring: > > #!/bin/bash > > > > #$ -cwd > > #$ -S /bin/bash > > #$ -j y > > #$ -pe orte 8 > > mpirun $HOME/testmpi/mpi-ring > > What mpi-ring in detail - where is the source resp. from what MPI library? > > -- Reuti > > > > (orte is one of 4 default parallel environments the system has) > > > > If I change the number of slots to 17 instead of 8, I get this error: > > APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) > > also a stranged file was produced: core.xxxx > > > > Why do I cannot submit more thatn 16 slots? > > > > Thanks. > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users > > > > > -- > Doan Trung Tung, PhD. > Researcher, HPC - Hanoi University of Technologies > Mobile: 0914720240 _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
