This at the end of the day is a race between the AM shutting down and the minicluster shutting down. If the RM of the minicluster shuts down before the AM (because the test code called minicluster.shutdown) then the YARN client lib (used by the AM) to talk to YARN can end up waiting for the RM to come back up.
Bikas *From:* Siddharth Seth [mailto:[email protected]] *Sent:* Thursday, September 04, 2014 1:47 AM *To:* [email protected] *Subject:* Re: orphaned DAGApp and TezChild This is a problem reported a while ago, I believe by Oleg. The lock issue is inside the YARNs AMRMClientAsync. When a TezSession is shutdown (tezClient.stop()) - it sets up handlers within the AM for future shutdown, and returns. After this. if the MiniCluster is shutdown, there's a possibility that the AM is still talking to the RM to schedule resources. Once the RM goes down, this invocation goes into a retry loop - while maintaining a lock, which is also required to unregister from the RM (once this lock is obtained - this would be another retry loop since the RM is no longer around). Created TEZ-1541 to track this, and see what can be done by Tez to avoid such situations. On Wed, Sep 3, 2014 at 8:44 PM, Chris K Wensel <[email protected]> wrote: this is confirmed on 0.5.0 (from apache release mvn repo) just caused a hang by running a single test, the TezChild did linger, but exited https://www.dropbox.com/s/86ryr1ka93xaiph/dagapp.threads.txt?dl=0 ckw On Sep 3, 2014, at 8:26 PM, Siddharth Seth <[email protected]> wrote: Chris, Are you on the latest version of Tez (ideally the 0.5 release, which just went out today). There was an issue with hanging DAGAppMasters, which was resolved recently. Otherwise, could you please include stack traces for the hung processes. Thanks - Sid On Wed, Sep 3, 2014 at 8:05 PM, Chris K Wensel <[email protected]> wrote: I'm finding after running MiniTezCluster I find a few DAGApp and possibly a TezChild process hanging around after calling jps. This is problematic with our CI servers (they start to add up) let a alone my dinky laptop. Is there a TezConfiguration setting I'm likely missing to prevent these. ckw -- Chris K Wensel [email protected] http://concurrentinc.com -- Chris K Wensel [email protected] http://concurrentinc.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
