Dear William and Bill,
thanks a lot for your answers.
I already configured limits.conf a few days ago on all nodes. 'ulimit
-n' (open files) gives 94000. That should be more than enough.
I did some more tests in the meantime.
The file i am running is very simple. I attached it.
i compiled it with 'mpicc teste.c' and get a.out as the executable.
The breakpoint seems to be 252. When i run on the masternode:
qsub -pe orte 252 -V -j yes -cwd -S /bin/bash <<< "export
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} && mpiexec -n 252 a.out >>
/home/ulrich/abc.out"
it runs, giving as output numerous lines like
[...]
Hello world from processor karun07, rank 58 out of 200 processors
Hello world from processor karun07, rank 59 out of 200 processors
[...]
Running
qsub -pe orte 253 -V -j yes -cwd -S /bin/bash <<< "export
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} && mpiexec -n 253 a.outn >>
/home/ulrich/abc.out"
gives:
Errno: 24 (Too many open files)
When i go now to one node (login there), no matter which, and do:
mpiexec -n 400 a.outn >> /home/ulrich/abc.out"
That works fine as it should.
I do not understand where the breakpoint 252/253 comes from, and why it
works with mpiexec directly on the node. Did i oversee a config issue?
I am not totally convinced that it is not a gridengine issue.
With kind regards, ulrich
On 06/13/2016 12:47 PM, William Hay wrote:
> On Fri, Jun 10, 2016 at 07:24:47PM +0200, Ulrich Hiller wrote:
>> Hello,
>>
>> I have a problem submiiting parralel jobs, e.g.:
>>
>
>> Your Open MPI job will likely hang until the failure resason is fixed
>> (e.g., more file descriptors and/or memory becomes available), and may
>> eventually timeout / abort.
>>
>> Local host: karun02
>> Errno: 24 (Too many open files)
>> Probable cause: Out of file descriptors
>> --------------------------------------------------------------------------
> This doesn't look like it has much to do with grid engine per se.
> I'd look at ulimit to see what is going on and tweak things
> to raise the number of open files allowed appropriately.
>
> On linux limits.conf would be the first place to look although
> shell startup scripts might lower the limits as well.
>
> William
>
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
}
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users