Here is the AM log. My setup doesn't use Ambari. Can you attach the
/etc/security/limits.d/yarn.conf file?

2015-10-05 04:43:36,769 INFO [main] Configuration.deprecation:
fs.default.name is deprecated. Instead, use fs.defaultFS
2015-10-05 04:43:36,972 WARN [main] util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java
classes where applicable
2015-10-05 04:43:37,077 INFO [main] app.DAGAppMaster: Using Standard
Socket Factory for communication
2015-10-05 04:43:37,269 INFO [main] app.DAGAppMaster: Created
DAGAppMaster for application appattempt_1444019975627_0001_000001,
versionInfo=[ component=tez-dag, version=0.7.0,
revision=e1c833e8be209145149cc6d3e1a01f8fa6e9b85f,
SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git,
buildTime=2015-09-29T13:26:49Z ]
2015-10-05 04:43:37,292 INFO [main] app.DAGAppMaster: Comparing client
version with AM version, clientVersion=0.7.0, AMVersion=0.7.0
2015-10-05 04:43:37,947 WARN [main] shortcircuit.DomainSocketFactory:
The short-circuit local reads feature cannot be used because libhadoop
cannot be loaded.
2015-10-05 04:43:38,086 INFO [main] app.DAGAppMaster: Adding session
token to jobTokenSecretManager for application
2015-10-05 04:43:38,090 INFO [main] common.AsyncDispatcher:
Registering class
org.apache.tez.dag.app.rm.container.AMContainerEventType for class
org.apache.tez.dag.app.rm.container.AMContainerMap
2015-10-05 04:43:38,092 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.rm.node.AMNodeEventType for
class org.apache.tez.dag.app.rm.node.AMNodeTracker
2015-10-05 04:43:38,093 INFO [main] common.AsyncDispatcher:
Registering class
org.apache.tez.dag.app.dag.event.DAGAppMasterEventType for class
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler
2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.dag.event.DAGEventType for
class org.apache.tez.dag.app.DAGAppMaster$DagEventDispatcher
2015-10-05 04:43:38,094 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.dag.event.VertexEventType for
class org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher
2015-10-05 04:43:38,095 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.dag.event.TaskEventType for
class org.apache.tez.dag.app.DAGAppMaster$TaskEventDispatcher
2015-10-05 04:43:38,096 INFO [main] common.AsyncDispatcher:
Registering class
org.apache.tez.dag.app.dag.event.TaskAttemptEventType for class
org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher
2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.dag.event.SpeculatorEventType
for independent dispatch using: class
org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler
2015-10-05 04:43:38,097 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.dag.event.SpeculatorEventType
for class org.apache.tez.dag.app.DAGAppMaster$SpeculatorEventHandler
2015-10-05 04:43:38,111 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.rm.AMSchedulerEventType for
class org.apache.tez.dag.app.rm.TaskSchedulerEventHandler
2015-10-05 04:43:38,113 INFO [main] common.AsyncDispatcher:
Registering class org.apache.tez.dag.app.rm.NMCommunicatorEventType
for class org.apache.tez.dag.app.launcher.ContainerLauncherImpl
2015-10-05 04:43:38,215 INFO [main] node.AMNodeTracker:
blacklistDisablePercent is 33, blacklistingEnabled: true,
maxTaskFailuresPerNode: 10
2015-10-05 04:43:38,217 INFO [main] web.WebUIService: Tez UI History
URL: 
http://ec2-54-158-112-249.compute-1.amazonaws.com:10500/#/tez-app/application_1444019975627_0001
2015-10-05 04:43:38,219 INFO [main] launcher.ContainerLauncherImpl:
Upper limit on the thread pool size is 500
2015-10-05 04:43:38,219 INFO [main] history.HistoryEventHandler:
Initializing HistoryEventHandler
2015-10-05 04:43:38,224 INFO [main] ats.ATSHistoryLoggingService:
Initializing ATSService
2015-10-05 04:43:38,806 INFO [main] impl.TimelineClientImpl: Timeline
service address:
http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/
2015-10-05 04:43:38,806 INFO [main] ats.ATSHistoryLoggingService:
Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager
to manage Timeline ACLs
2015-10-05 04:43:38,971 INFO [main] impl.TimelineClientImpl: Timeline
service address:
http://ec2-54-158-112-249.compute-1.amazonaws.com:8188/ws/v1/timeline/
2015-10-05 04:43:38,971 INFO [main] recovery.RecoveryService:
Initializing RecoveryService
2015-10-05 04:43:38,974 INFO [main] history.HistoryEventHandler:
[HISTORY][DAG:N/A][Event:APP_LAUNCHED]:
applicationId=application_1444019975627_0001,
appSubmitTime=1444020210133, launchTime=1444020217266
2015-10-05 04:43:38,975 INFO [main] history.HistoryEventHandler:
[HISTORY][DAG:N/A][Event:AM_LAUNCHED]:
appAttemptId=appattempt_1444019975627_0001_000001,
appSubmitTime=1444020210133, launchTime=1444020217266
2015-10-05 04:43:38,982 INFO
[ServiceThread:org.apache.tez.dag.history.HistoryEventHandler]
history.HistoryEventHandler: Starting HistoryEventHandler
2015-10-05 04:43:38,982 INFO
[ServiceThread:org.apache.tez.dag.history.HistoryEventHandler]
ats.ATSHistoryLoggingService: Starting ATSService
2015-10-05 04:43:38,985 INFO
[ServiceThread:org.apache.tez.dag.app.launcher.ContainerLauncherImpl]
impl.ContainerManagementProtocolProxy:
yarn.client.max-cached-nodemanagers-proxies : 0
2015-10-05 04:43:38,990 INFO [ServiceThread:DAGClientRPCServer]
ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
2015-10-05 04:43:38,990 INFO
[ServiceThread:org.apache.tez.dag.app.TaskAttemptListenerImpTezDag]
ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
2015-10-05 04:43:38,997 INFO
[ServiceThread:org.apache.tez.dag.history.HistoryEventHandler]
recovery.RecoveryService: Starting RecoveryService
2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47120]
ipc.Server: Starting Socket Reader #1 for port 47120
2015-10-05 04:43:39,001 INFO [Socket Reader #1 for port 47029]
ipc.Server: Starting Socket Reader #1 for port 47029
2015-10-05 04:43:39,027 INFO [IPC Server Responder] ipc.Server: IPC
Server Responder: starting
2015-10-05 04:43:39,033 INFO [IPC Server listener on 47029]
ipc.Server: IPC Server listener on 47029: starting
2015-10-05 04:43:39,035 INFO [IPC Server listener on 47120]
ipc.Server: IPC Server listener on 47120: starting
2015-10-05 04:43:39,035 INFO [IPC Server Responder] ipc.Server: IPC
Server Responder: starting
2015-10-05 04:43:39,035 INFO [ServiceThread:DAGClientRPCServer]
client.DAGClientServer: Instantiated DAGClientRPCServer at
ip-10-149-127-154.ec2.internal/10.149.127.154:47029
2015-10-05 04:43:39,070 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log:
Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2015-10-05 04:43:39,074 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
http.HttpRequestLog: Http request log for http.requests. is not
defined
2015-10-05 04:43:39,083 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
http.HttpServer2: Added global filter 'safety'
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-10-05 04:43:39,087 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
http.HttpServer2: Added filter AM_PROXY_FILTER
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to
context
2015-10-05 04:43:39,087 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
http.HttpServer2: Added filter AM_PROXY_FILTER
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to
context static
2015-10-05 04:43:39,090 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
http.HttpServer2: adding path spec: /*
2015-10-05 04:43:39,097 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
http.HttpServer2: Jetty bound to port 34881
2015-10-05 04:43:39,098 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log:
jetty-6.1.26
2015-10-05 04:43:39,127 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log:
Extract 
jar:file:/media/ephemeral1/yarn/local/filecache/20/hadoop-yarn-common-2.6.0.jar!/webapps/
to /tmp/Jetty_0_0_0_0_34881_webapps____.97xzaz/webapp
2015-10-05 04:43:39,288 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log:
NO JSP Support for /, did not find
org.apache.jasper.servlet.JspServlet
2015-10-05 04:43:39,298 FATAL [HistoryEventHandlingThread]
yarn.YarnUncaughtExceptionHandler: Thread
Thread[HistoryEventHandlingThread,5,main] threw an Error.  Shutting
down now...
java.lang.NoSuchMethodError:
org.codehaus.jackson.map.ObjectMapper.setSerializationInclusion(Lorg/codehaus/jackson/map/annotate/JsonSerialize$Inclusion;)Lorg/codehaus/jackson/map/ObjectMapper;
        at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.configObjectMapper(YarnJacksonJaxbJsonProvider.java:59)
        at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.locateMapper(YarnJacksonJaxbJsonProvider.java:52)
        at 
org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:501)
        at 
com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:204)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:226)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:162)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:237)
        at com.sun.jersey.api.client.Client.handle(Client.java:648)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at 
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:472)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:321)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
        at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:342)
        at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:52)
        at 
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:186)
        at java.lang.Thread.run(Thread.java:745)
2015-10-05 04:43:39,303 INFO [HistoryEventHandlingThread]
util.ExitUtil: Exiting with status -1
2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster:
DAGAppMasterShutdownHook invoked
2015-10-05 04:43:39,307 INFO [Thread-2] app.DAGAppMaster: DAGAppMaster
received a signal. Signaling TaskScheduler
2015-10-05 04:43:39,307 INFO [Thread-2] rm.TaskSchedulerEventHandler:
TaskScheduler notified that iSignalled was : true
2015-10-05 04:43:39,397 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] mortbay.log:
Started [email protected]:34881
2015-10-05 04:43:39,397 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
webapp.WebApps: Web app / started at 34881
2015-10-05 04:43:39,608 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
webapp.WebApps: Registered webapp guice modules
2015-10-05 04:43:39,610 INFO
[ServiceThread:org.apache.tez.dag.app.web.WebUIService]
web.WebUIService: Instantiated WebUIService at
http://ip-10-149-127-154.ec2.internal:34881/ui/
2015-10-05 04:43:39,645 INFO
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler]
rm.YarnTaskSchedulerService: TaskScheduler initialized with
configuration: maxRMHeartbeatInterval: 1000, containerReuseEnabled:
true, reuseRackLocal: true, reuseNonLocal: false,
localitySchedulingDelay: 250, preemptionPercentage: 10,
numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 5000,
idleContainerMaxTimeout: 10000, sessionMinHeldContainers: 0
2015-10-05 04:43:39,719 INFO
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler]
client.RMProxy: Connecting to ResourceManager at
ec2-54-158-112-249.compute-1.amazonaws.com/10.157.14.248:8030
2015-10-05 04:43:39,812 INFO [main] history.HistoryEventHandler:
[HISTORY][DAG:N/A][Event:AM_STARTED]:
appAttemptId=appattempt_1444019975627_0001_000001,
startTime=1444020219811
2015-10-05 04:43:39,812 INFO [main] app.DAGAppMaster: In Session mode.
Waiting for DAG over RPC
2015-10-05 04:43:39,813 INFO [Thread-2] history.HistoryEventHandler:
Stopping HistoryEventHandler
2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService:
Stopping RecoveryService
2015-10-05 04:43:39,814 INFO [Thread-2] recovery.RecoveryService:
Handle the remaining events in queue, queue size=0
2015-10-05 04:43:39,814 INFO [RecoveryEventHandlingThread]
recovery.RecoveryService: EventQueue take interrupted. Returning
2015-10-05 04:43:39,814 INFO [Thread-2] ats.ATSHistoryLoggingService:
Stopping ATSService, eventQueueBacklog=1
2015-10-05 04:43:39,814 WARN [Thread-2] ats.ATSHistoryLoggingService:
ATSService being stopped, eventQueueBacklog=1, maxTimeLeftToFlush=-1,
waitForever=true


On 5 October 2015 at 10:48, Gopal Vijayaraghavan <[email protected]> wrote:

>
> > The failures are always intermittent. Any idea why this happens?
>
> First up you should try 0.7.1, because of TEZ-2663.
>
> > Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has
> >already shutdown. Application application_1444019975627_0001 failed 2
> >times due to AM Container for appattempt_1444019975627_0001_000002 exited
> >with exitCode: 255
>
> Can you say what is printed in the AppMaster logs?
>
> I've seen this occasionally happen due to bad setup of cluster uid-limits.
>
> The ambari sets that up in a file named
>
> /etc/security/limits.d/yarn.conf (yarn.conf.j2)
>
>
> Check for that file. Otherwise as shuffle handler spawns threads, the
> container launchers will start to intermittently fail (the default is 1024
> threads per-user, yarn.conf ups this to 65,000 threads).
>
> Cheers,
> Gopal
>
>
>

Reply via email to