is there a way to block the mini cluster shutdown waiting for the AM to go down? or just (find then) push a shutdown to the AM?
ckw On Sep 4, 2014, at 11:09 AM, Bikas Saha <[email protected]> wrote: > This at the end of the day is a race between the AM shutting down and the > minicluster shutting down. If the RM of the minicluster shuts down before the > AM (because the test code called minicluster.shutdown) then the YARN client > lib (used by the AM) to talk to YARN can end up waiting for the RM to come > back up. > > Bikas > > From: Siddharth Seth [mailto:[email protected]] > Sent: Thursday, September 04, 2014 1:47 AM > To: [email protected] > Subject: Re: orphaned DAGApp and TezChild > > This is a problem reported a while ago, I believe by Oleg. > > The lock issue is inside the YARNs AMRMClientAsync. > > When a TezSession is shutdown (tezClient.stop()) - it sets up handlers within > the AM for future shutdown, and returns. > After this. if the MiniCluster is shutdown, there's a possibility that the AM > is still talking to the RM to schedule resources. Once the RM goes down, this > invocation goes into a retry loop - while maintaining a lock, which is also > required to unregister from the RM (once this lock is obtained - this would > be another retry loop since the RM is no longer around). > > Created TEZ-1541 to track this, and see what can be done by Tez to avoid such > situations. > > On Wed, Sep 3, 2014 at 8:44 PM, Chris K Wensel <[email protected]> wrote: > > this is confirmed on 0.5.0 (from apache release mvn repo) > > just caused a hang by running a single test, the TezChild did linger, but > exited > > https://www.dropbox.com/s/86ryr1ka93xaiph/dagapp.threads.txt?dl=0 > > ckw > > On Sep 3, 2014, at 8:26 PM, Siddharth Seth <[email protected]> wrote: > > > Chris, > Are you on the latest version of Tez (ideally the 0.5 release, which just > went out today). There was an issue with hanging DAGAppMasters, which was > resolved recently. > Otherwise, could you please include stack traces for the hung processes. > > Thanks > - Sid > > On Wed, Sep 3, 2014 at 8:05 PM, Chris K Wensel <[email protected]> wrote: > > I'm finding after running MiniTezCluster I find a few DAGApp and possibly a > TezChild process hanging around after calling jps. > > This is problematic with our CI servers (they start to add up) let a alone my > dinky laptop. > > Is there a TezConfiguration setting I'm likely missing to prevent these. > > ckw > > -- > Chris K Wensel > [email protected] > http://concurrentinc.com > > > > -- > Chris K Wensel > [email protected] > http://concurrentinc.com > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You. -- Chris K Wensel [email protected] http://concurrentinc.com
