> The failures are always intermittent. Any idea why this happens?

First up you should try 0.7.1, because of TEZ-2663.

> Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has
>already shutdown. Application application_1444019975627_0001 failed 2
>times due to AM Container for appattempt_1444019975627_0001_000002 exited
>with exitCode: 255

Can you say what is printed in the AppMaster logs?

I've seen this occasionally happen due to bad setup of cluster uid-limits.

The ambari sets that up in a file named

/etc/security/limits.d/yarn.conf (yarn.conf.j2)


Check for that file. Otherwise as shuffle handler spawns threads, the
container launchers will start to intermittently fail (the default is 1024
threads per-user, yarn.conf ups this to 65,000 threads).

Cheers,
Gopal


Reply via email to