This is a problem reported a while ago, I believe by Oleg. The lock issue is inside the YARNs AMRMClientAsync.
When a TezSession is shutdown (tezClient.stop()) - it sets up handlers within the AM for future shutdown, and returns. After this. if the MiniCluster is shutdown, there's a possibility that the AM is still talking to the RM to schedule resources. Once the RM goes down, this invocation goes into a retry loop - while maintaining a lock, which is also required to unregister from the RM (once this lock is obtained - this would be another retry loop since the RM is no longer around). Created TEZ-1541 to track this, and see what can be done by Tez to avoid such situations. On Wed, Sep 3, 2014 at 8:44 PM, Chris K Wensel <[email protected]> wrote: > > this is confirmed on 0.5.0 (from apache release mvn repo) > > just caused a hang by running a single test, the TezChild did linger, but > exited > > https://www.dropbox.com/s/86ryr1ka93xaiph/dagapp.threads.txt?dl=0 > > ckw > > On Sep 3, 2014, at 8:26 PM, Siddharth Seth <[email protected]> wrote: > > Chris, > Are you on the latest version of Tez (ideally the 0.5 release, which just > went out today). There was an issue with hanging DAGAppMasters, which was > resolved recently. > Otherwise, could you please include stack traces for the hung processes. > > Thanks > - Sid > > > On Wed, Sep 3, 2014 at 8:05 PM, Chris K Wensel <[email protected]> wrote: > >> >> I'm finding after running MiniTezCluster I find a few DAGApp and possibly >> a TezChild process hanging around after calling jps. >> >> This is problematic with our CI servers (they start to add up) let a >> alone my dinky laptop. >> >> Is there a TezConfiguration setting I'm likely missing to prevent these. >> >> ckw >> >> -- >> Chris K Wensel >> [email protected] >> http://concurrentinc.com >> >> > > -- > Chris K Wensel > [email protected] > http://concurrentinc.com > >
