Re: Any reason a bunch of nearly-identical jobs would suddenly stop working?

2011-03-08 Thread Dmitriy Ryaboy
Check task logs. I am guessing you ran out of either hdfs or local disk on the nodes. Also, never let your sysadmin go on vacation, that's what makes things break! :) D On Tue, Mar 8, 2011 at 2:53 PM, Kris Coward k...@melon.org wrote: So I queued up a batch of jobs last night to run

Re: Any reason a bunch of nearly-identical jobs would suddenly stop working?

2011-03-08 Thread Kris Coward
None of the nodes have more than 20% utilization on any of their disks; so it must be the cluster figuring that it can get away with this sort of thing when the sysadmin's not around to set it straight.. clearly a cluster of redundant/load-sharing sysadmins is also needed :) -K On Tue, Mar 08,