On 09/03/2010 10:05 PM, Jeff Squyres wrote:
On Sep 3, 2010, at 12:16 AM, Ralph Castain wrote:

Backing off the polling rate requires more application-specific logic like that 
offered below, so it is a little difficult for us to implement at the MPI 
library level. Not saying we eventually won't - just not sure anyone quite 
knows how to do so in a generalized form.

FWIW, we've *talked* about this kind of stuff among the developers -- it's at least 
somewhat similar to the "backoff to blocking communications instead of polling 
communications" issues.  That work in particular has been discussed for a long time 
but never implemented.

Are your jobs hanging because of deadlock (i.e., application error), or 
infrastructure error?  If they're hanging because of deadlock, there are some 
PMPI-based tools that might be able to help.


These are application deadlocks (like the well-known VASP calling MPI_Finalize 
when
it should be calling MPI_Abort!).  But I'm asking as a system manager with 
dozens of
apps run by dozens of users hanging and not being noticed for a day or two 
because
users are not attentive and, from outside the job, everything looks OK. So the 
problem
is detection.  Are you suggesting there are PMPI approaches we could apply to 
every
production job on the system?

I now have a hack to opal_progress that seems to do what we want without any 
impact
on performance in the "good" case.  It basically involves keeping count of the 
number
of contiguous calls to opal_progress with no events completed.  When that hits 
a large
number (eg 10^9), sleeping (maybe up to a second) on every, say, 10^3-10^4 
passes
through opal_progress seems to do "the right thing". (Obviously, any event 
completion
resets everything to spinning.)   There are a few magic numbers there that need 
to
be overrideable by users.  Please let me know if this idea is blatantly flawed.

Thanks,
David

Reply via email to