Re: [OMPI devel] Too many open files (24)

2011-03-30 Thread Samuel K. Gutierrez
Hi Tim, Great news! Happy calculating :-). -- Samuel K. Gutierrez Los Alamos National Laboratory > Dear Samuel, > > Just as you replied I was trying that on the compute nodes. Surprise, > surprise...the value returned as the hard and soft limits is 1024. > > Thanks for confirming my suspicions.

Re: [OMPI devel] Add child to another parent.

2011-03-30 Thread Ralph Castain
Sorry - should have included the devel list when I sent this. On Mar 30, 2011, at 6:11 PM, Ralph Castain wrote: > I'm not the expert on this area - Josh is, so I'll defer to him. I did take a > quick glance at the sstore framework, though, and it looks like there are > some params you could se

Re: [OMPI devel] Too many open files (24)

2011-03-30 Thread Timothy Stitt
Dear Samuel, Just as you replied I was trying that on the compute nodes. Surprise, surprise...the value returned as the hard and soft limits is 1024. Thanks for confirming my suspicions... Regards, Tim. On Mar 30, 2011, at 7:41 PM, Samuel K. Gutierrez wrote: Hi, It sounds like Open MPI is h

Re: [OMPI devel] Too many open files (24)

2011-03-30 Thread Samuel K. Gutierrez
Hi, It sounds like Open MPI is hitting your system's open file descriptor limit. If that's the case, one potential workaround is to have your system administrator raise file descriptor limits. On a compute node, what does "ulimit -a" show (using bash)? Hope that helps, -- Samuel K. Gutie

[OMPI devel] Too many open files (24)

2011-03-30 Thread Timothy Stitt
Dear OpenMPI developers, One of our users was running a benchmark on a 1032 core simulation. He had a successful run at 900 cores but when he stepped up to 1032 cores the job just stalled and his logs contained many occurrences of the following line: [d6copt368.crc.nd.edu][[25621,1],0][btl_tcp_

Re: [OMPI devel] Add child to another parent.

2011-03-30 Thread Hugo Meyer
Hello again. I'm working in the launch code to handle my checkpoints, but i'm a little stuck in how to set the path to my checkpoint and the executable (ompi_blcr_context.PID). I take a look at the code in odls_base_default_fns.c and this piece of code took my attention: #if OPAL_ENABLE_FT_CR ==

Re: [OMPI devel] Add child to another parent.

2011-03-30 Thread Hugo Meyer
Thanks Ralph. I have finished the (a) point, and now its working, now i have to work to relaunch from my checkpoint as you said. Best regards. Hugo Meyer 2011/3/29 Ralph Castain > The resilient mapper -only- works on procs being restarted - it cannot map > a job for its initial launch. You sho

[OMPI devel] Fwd: [devel-core] Open MPI Developers Meeting

2011-03-30 Thread Joshua Hursey
Rich wanted to make this available to a broader audience. Re-posting to the devel list. Begin forwarded message: > From: Joshua Hursey > Date: March 30, 2011 9:14:03 AM CDT > Subject: [devel-core] Open MPI Developers Meeting > > It has been requested that we have a face-to-face Open MPI develo