This at the end of the day is a race between the AM shutting down and the
minicluster shutting down. If the RM of the minicluster shuts down before
the AM (because the test code called minicluster.shutdown) then the YARN
client lib (used by the AM) to talk to YARN can end up waiting for the RM
to come back up.



Bikas



*From:* Siddharth Seth [mailto:[email protected]]
*Sent:* Thursday, September 04, 2014 1:47 AM
*To:* [email protected]
*Subject:* Re: orphaned DAGApp and TezChild



This is a problem reported a while ago, I believe by Oleg.



The lock issue is inside the YARNs AMRMClientAsync.



When a TezSession is shutdown (tezClient.stop()) - it sets up handlers
within the AM for future shutdown, and returns.

After this. if the MiniCluster is shutdown, there's a possibility that the
AM is still talking to the RM to schedule resources. Once the RM goes down,
this invocation goes into a retry loop - while maintaining a lock, which is
also required to unregister from the RM (once this lock is obtained - this
would be another retry loop since the RM is no longer around).



Created TEZ-1541 to track this, and see what can be done by Tez to avoid
such situations.



On Wed, Sep 3, 2014 at 8:44 PM, Chris K Wensel <[email protected]> wrote:



this is confirmed on 0.5.0 (from apache release mvn repo)



just caused a hang by running a single test, the TezChild did linger, but
exited



https://www.dropbox.com/s/86ryr1ka93xaiph/dagapp.threads.txt?dl=0



ckw



On Sep 3, 2014, at 8:26 PM, Siddharth Seth <[email protected]> wrote:



Chris,

Are you on the latest version of Tez (ideally the 0.5 release, which just
went out today). There was an issue with hanging DAGAppMasters, which was
resolved recently.

Otherwise, could you please include stack traces for the hung processes.



Thanks

- Sid



On Wed, Sep 3, 2014 at 8:05 PM, Chris K Wensel <[email protected]> wrote:



I'm finding after running MiniTezCluster I find a few DAGApp and possibly a
TezChild process hanging around after calling jps.



This is problematic with our CI servers (they start to add up) let a alone
my dinky laptop.



Is there a TezConfiguration setting I'm likely missing to prevent these.



ckw



--

Chris K Wensel

[email protected]

http://concurrentinc.com







--

Chris K Wensel

[email protected]

http://concurrentinc.com

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to