Dear Zeppelin community,
I would like to ask for advice in regards an error I am having with thrift.
I am getting quite a lot of these errors while running my notebooks
org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:437) at
org.apache.zeppelin.scheduler.Job.run(Job.java:188) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
And this is the Spark driver application logs:
...
===============================================================================
YARN executor launch context:
env:
CLASSPATH ->
{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/3.1.0.0-78/hadoop/*<CPS>/usr/hdp/3.1.0.0-78/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/3.1.0.0-78/hadoop/lib/hadoop-lzo-0.6.0.3.1.0.0-78.jar:/etc/hadoop/conf/secure<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR ->
hdfs://gl-hdp-ctrl01-mlx.mlx:8020/user/mansop/.sparkStaging/application_1568954689585_0052
SPARK_USER -> mansop
PYTHONPATH ->
/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip
command:
LD_LIBRARY_PATH="/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:$LD_LIBRARY_PATH"
\
{{JAVA_HOME}}/bin/java \
-server \
-Xmx1024m \
'-XX:+UseNUMA' \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.history.ui.port=18081' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://[email protected]:35602 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1568954689585_0052 \
--user-class-path \
file:$PWD/__app__.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
__app__.jar -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
port: 8020 file:
"/user/mansop/.sparkStaging/application_1568954689585_0052/spark-interpreter-0.8.0.3.1.0.0-78.jar"
} size: 20433040 timestamp: 1569804142906 type: FILE visibility: PRIVATE
__spark_conf__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
port: 8020 file:
"/user/mansop/.sparkStaging/application_1568954689585_0052/__spark_conf__.zip"
} size: 277725 timestamp: 1569804143239 type: ARCHIVE visibility: PRIVATE
sparkr -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx" port:
8020 file:
"/user/mansop/.sparkStaging/application_1568954689585_0052/sparkr.zip" } size:
688255 timestamp: 1569804142991 type: ARCHIVE visibility: PRIVATE
log4j_yarn_cluster.properties -> resource { scheme: "hdfs" host:
"gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
"/user/mansop/.sparkStaging/application_1568954689585_0052/log4j_yarn_cluster.properties"
} size: 1018 timestamp: 1569804142955 type: FILE visibility: PRIVATE
pyspark.zip -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
port: 8020 file:
"/user/mansop/.sparkStaging/application_1568954689585_0052/pyspark.zip" } size:
550570 timestamp: 1569804143018 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-yarn-archive.tar.gz" }
size: 280293050 timestamp: 1568938921259 type: ARCHIVE visibility: PUBLIC
py4j-0.10.7-src.zip -> resource { scheme: "hdfs" host:
"gl-hdp-ctrl01-mlx.mlx" port: 8020 file:
"/user/mansop/.sparkStaging/application_1568954689585_0052/py4j-0.10.7-src.zip"
} size: 42437 timestamp: 1569804143043 type: FILE visibility: PRIVATE
__hive_libs__ -> resource { scheme: "hdfs" host: "gl-hdp-ctrl01-mlx.mlx"
port: 8020 file: "/hdp/apps/3.1.0.0-78/spark2/spark2-hdp-hive-archive.tar.gz" }
size: 43807162 timestamp: 1568938925069 type: ARCHIVE visibility: PUBLIC
===============================================================================
INFO [2019-09-30 10:42:37,303] ({main} RMProxy.java[newProxyInstance]:133) -
Connecting to ResourceManager at gl-hdp-ctrl03-mlx.mlx/10.0.1.248:8030
INFO [2019-09-30 10:42:37,324] ({main} Logging.scala[logInfo]:54) - Registering
the ApplicationMaster
INFO [2019-09-30 10:42:37,454] ({main}
Configuration.java[getConfResourceAsInputStream]:2756) - found resource
resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml
INFO [2019-09-30 10:42:37,470] ({main} Logging.scala[logInfo]:54) - Will
request 2 executor container(s), each with 1 core(s) and 1408 MB memory
(including 384 MB of overhead)
INFO [2019-09-30 10:42:37,474] ({dispatcher-event-loop-14}
Logging.scala[logInfo]:54) - ApplicationMaster registered as
NettyRpcEndpointRef(spark://[email protected]:35602)
INFO [2019-09-30 10:42:37,485] ({main} Logging.scala[logInfo]:54) - Submitted 2
unlocalized container requests.
INFO [2019-09-30 10:42:37,518] ({main} Logging.scala[logInfo]:54) - Started
progress reporter thread with (heartbeat : 3000, initial allocation : 200)
intervals
INFO [2019-09-30 10:42:37,619] ({Reporter} Logging.scala[logInfo]:54) -
Launching container container_e01_1568954689585_0052_01_000002 on host
r640-1-12-mlx.mlx for executor with ID 1
INFO [2019-09-30 10:42:37,621] ({Reporter} Logging.scala[logInfo]:54) -
Launching container container_e01_1568954689585_0052_01_000003 on host
r640-1-13-mlx.mlx for executor with ID 2
INFO [2019-09-30 10:42:37,623] ({Reporter} Logging.scala[logInfo]:54) -
Received 2 containers from YARN, launching executors on 2 of them.
INFO [2019-09-30 10:42:39,481] ({dispatcher-event-loop-51}
Logging.scala[logInfo]:54) - Registered executor
NettyRpcEndpointRef(spark-client://Executor) (10.0.1.12:54340) with ID 1
INFO [2019-09-30 10:42:39,553] ({dispatcher-event-loop-62}
Logging.scala[logInfo]:54) - Registering block manager r640-1-12-mlx.mlx:33043
with 408.9 MB RAM, BlockManagerId(1, r640-1-12-mlx.mlx, 33043, None)
INFO [2019-09-30 10:42:40,003] ({dispatcher-event-loop-9}
Logging.scala[logInfo]:54) - Registered executor
NettyRpcEndpointRef(spark-client://Executor) (10.0.1.13:33812) with ID 2
INFO [2019-09-30 10:42:40,023] ({pool-6-thread-2} Logging.scala[logInfo]:54) -
SchedulerBackend is ready for scheduling beginning after reached
minRegisteredResourcesRatio: 0.8
INFO [2019-09-30 10:42:40,025] ({pool-6-thread-2} Logging.scala[logInfo]:54) -
YarnClusterScheduler.postStartHook done
INFO [2019-09-30 10:42:40,072] ({dispatcher-event-loop-11}
Logging.scala[logInfo]:54) - Registering block manager r640-1-13-mlx.mlx:34105
with 408.9 MB RAM, BlockManagerId(2, r640-1-13-mlx.mlx, 34105, None)
INFO [2019-09-30 10:42:41,779] ({pool-6-thread-2}
SparkShims.java[loadShims]:54) - Initializing shims for Spark 2.x
INFO [2019-09-30 10:42:41,840] ({pool-6-thread-2}
Py4JUtils.java[createGatewayServer]:44) - Launching GatewayServer at
127.0.0.1:36897
INFO [2019-09-30 10:42:41,852] ({pool-6-thread-2}
PySparkInterpreter.java[createGatewayServerAndStartScript]:265) - pythonExec:
/home/mansop/anaconda2/bin/python
INFO [2019-09-30 10:42:41,862] ({pool-6-thread-2}
PySparkInterpreter.java[setupPySparkEnv]:236) - PYTHONPATH:
/usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip:/usr/hdp/current/spark2-client/python/::/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/pyspark.zip:/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/container_e01_1568954689585_0052_01_000001/py4j-0.10.7-src.zip
ERROR [2019-09-30 10:43:09,061] ({SIGTERM handler}
SignalUtils.scala[apply$mcZ$sp]:43) - RECEIVED SIGNAL TERM
INFO [2019-09-30 10:43:09,068] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Invoking stop() from shutdown hook
INFO [2019-09-30 10:43:09,082] ({shutdown-hook-0}
AbstractConnector.java[doStop]:318) - Stopped
Spark@505439b3{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
INFO [2019-09-30 10:43:09,085] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Stopped Spark web UI at http://r640-1-12-mlx.mlx:42446
INFO [2019-09-30 10:43:09,140] ({dispatcher-event-loop-52}
Logging.scala[logInfo]:54) - Driver requested a total number of 0 executor(s).
INFO [2019-09-30 10:43:09,142] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Shutting down all executors
INFO [2019-09-30 10:43:09,144] ({dispatcher-event-loop-51}
Logging.scala[logInfo]:54) - Asking each executor to shut down
INFO [2019-09-30 10:43:09,151] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
ERROR [2019-09-30 10:43:09,155] ({Reporter} Logging.scala[logError]:91) -
Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException:
Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
ApplicationMasterService cache.
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
at
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy21.allocate(Unknown Source)
at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:320)
at
org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
Application attempt appattempt_1568954689585_0052_000001 doesn't exist in
ApplicationMasterService cache.
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
at org.apache.hadoop.ipc.Client.call(Client.java:1443)
at org.apache.hadoop.ipc.Client.call(Client.java:1353)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy20.allocate(Unknown Source)
at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 13 more
INFO [2019-09-30 10:43:09,164] ({Reporter} Logging.scala[logInfo]:54) - Final
app status: FAILED, exitCode: 12, (reason: Application attempt
appattempt_1568954689585_0052_000001 doesn't exist in ApplicationMasterService
cache.
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:404)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
)
INFO [2019-09-30 10:43:09,166] ({dispatcher-event-loop-54}
Logging.scala[logInfo]:54) - MapOutputTrackerMasterEndpoint stopped!
INFO [2019-09-30 10:43:09,236] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
MemoryStore cleared
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
BlockManager stopped
INFO [2019-09-30 10:43:09,237] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
BlockManagerMaster stopped
INFO [2019-09-30 10:43:09,241] ({dispatcher-event-loop-73}
Logging.scala[logInfo]:54) - OutputCommitCoordinator stopped!
INFO [2019-09-30 10:43:09,252] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Successfully stopped SparkContext
INFO [2019-09-30 10:43:09,253] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Shutdown hook called
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Deleting directory
/d1/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-ba80cda3-812a-4cf0-b1f6-6e9eb52952b2
INFO [2019-09-30 10:43:09,254] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Deleting directory
/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8
INFO [2019-09-30 10:43:09,255] ({shutdown-hook-0} Logging.scala[logInfo]:54) -
Deleting directory
/d0/hadoop/yarn/local/usercache/mansop/appcache/application_1568954689585_0052/spark-43078781-8f1c-4cd6-a8da-e81b32892cf8/pyspark-9138f7ad-3f15-42c6-9bf3-e3e72d5d4086
How can I continue troubleshooting in order to find out what this error means?
Thank you very much
NOTICE
Please consider the environment before printing this email. This message and
any attachments are intended for the addressee named and may contain legally
privileged/confidential/copyright information. If you are not the intended
recipient, you should not read, use, disclose, copy or distribute this
communication. If you have received this message in error please notify us at
once by return email and then delete both messages. We accept no liability for
the distribution of viruses or similar in electronic communications. This
notice should not be removed.