is there a way to block the mini cluster shutdown waiting for the AM to go 
down? or just (find then) push a shutdown to the AM?

ckw

On Sep 4, 2014, at 11:09 AM, Bikas Saha <[email protected]> wrote:

> This at the end of the day is a race between the AM shutting down and the 
> minicluster shutting down. If the RM of the minicluster shuts down before the 
> AM (because the test code called minicluster.shutdown) then the YARN client 
> lib (used by the AM) to talk to YARN can end up waiting for the RM to come 
> back up.
>  
> Bikas
>  
> From: Siddharth Seth [mailto:[email protected]] 
> Sent: Thursday, September 04, 2014 1:47 AM
> To: [email protected]
> Subject: Re: orphaned DAGApp and TezChild
>  
> This is a problem reported a while ago, I believe by Oleg.
>  
> The lock issue is inside the YARNs AMRMClientAsync.
>  
> When a TezSession is shutdown (tezClient.stop()) - it sets up handlers within 
> the AM for future shutdown, and returns.
> After this. if the MiniCluster is shutdown, there's a possibility that the AM 
> is still talking to the RM to schedule resources. Once the RM goes down, this 
> invocation goes into a retry loop - while maintaining a lock, which is also 
> required to unregister from the RM (once this lock is obtained - this would 
> be another retry loop since the RM is no longer around).
>  
> Created TEZ-1541 to track this, and see what can be done by Tez to avoid such 
> situations.
>  
> On Wed, Sep 3, 2014 at 8:44 PM, Chris K Wensel <[email protected]> wrote:
>  
> this is confirmed on 0.5.0 (from apache release mvn repo)
>  
> just caused a hang by running a single test, the TezChild did linger, but 
> exited
>  
> https://www.dropbox.com/s/86ryr1ka93xaiph/dagapp.threads.txt?dl=0
>  
> ckw
>  
> On Sep 3, 2014, at 8:26 PM, Siddharth Seth <[email protected]> wrote:
> 
> 
> Chris,
> Are you on the latest version of Tez (ideally the 0.5 release, which just 
> went out today). There was an issue with hanging DAGAppMasters, which was 
> resolved recently.
> Otherwise, could you please include stack traces for the hung processes.
>  
> Thanks
> - Sid
>  
> On Wed, Sep 3, 2014 at 8:05 PM, Chris K Wensel <[email protected]> wrote:
>  
> I'm finding after running MiniTezCluster I find a few DAGApp and possibly a 
> TezChild process hanging around after calling jps.
>  
> This is problematic with our CI servers (they start to add up) let a alone my 
> dinky laptop.
>  
> Is there a TezConfiguration setting I'm likely missing to prevent these.
>  
> ckw
>  
> --
> Chris K Wensel
> [email protected]
> http://concurrentinc.com
>  
>  
>  
> --
> Chris K Wensel
> [email protected]
> http://concurrentinc.com
>  
>  
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader of 
> this message is not the intended recipient, you are hereby notified that any 
> printing, copying, dissemination, distribution, disclosure or forwarding of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please contact the sender immediately and delete it 
> from your system. Thank You.

--
Chris K Wensel
[email protected]
http://concurrentinc.com

Reply via email to