Hello all,
I'm running a LLAP daemon through YARN + ZK. The container for a Hive query
begins to execute but there's a class cast error that I don't know how to
debug. Here's the logs:
cat syslog_dag_<container_id>
---------------------------------------------------
...
2019-11-11 17:32:02,631 [INFO] [LlapScheduler]
|tezplugins.LlapTaskSchedulerService|: Assigned #1,
task=TaskInfo{task=attempt_1573233179705_0050_1_00_000000_0, priority=5,
startTime=0, containerId=null, uniqueId=0, localityDelayTimeout=0} on
node={hostname:43033, id=d84432aa-f08f-467d-8688-c9150430f05e,
canAcceptTask=true, st=0, ac=12, commF=false, disabled=false}, to
container=container_222212222_0050_01_000001
2019-11-11 17:32:02,631 [INFO] [LlapScheduler] |GuaranteedTasks|: Registering
attempt_1573233179705_0050_1_00_000000_0; false
2019-11-11 17:32:02,648 [INFO] [TaskSchedulerAppCallbackExecutor #0]
|node.PerSourceNodeTracker|: Adding new node hostname:43033 to nodeTracker 2
2019-11-11 17:32:02,680 [INFO] [Dispatcher thread {Central}]
|tezplugins.LlapTaskCommunicator|: CurrentDagId set to: 1, name=select
count(device_id) from ...'impression' (Stage-1),
queryId=root_20191111173153_2e979533-4d13-4b66-a0a5-fd7d48c07e2f
2019-11-11 17:32:02,680 [INFO] [Dispatcher thread {Central}]
|tezplugins.LlapTaskCommunicator|: Added new known node: hostname:43033
2019-11-11 17:32:02,721 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:N/A][Event:CONTAINER_LAUNCHED]:
containerId=container_222212222_0050_01_000001, launchTime=1573493522721
2019-11-11 17:32:02,722 [INFO] [TaskCommunicator # 0]
|impl.LlapProtocolClientImpl|: Creating protocol proxy as null
2019-11-11 17:32:02,722 [INFO] [Dispatcher thread {Central}]
|impl.TaskAttemptImpl|: TaskAttempt: [attempt_1573233179705_0050_1_00_000000_0]
submitted. Is using containerId: [container_222212222_0050_01_000001] on NM:
[hostname:43033]
2019-11-11 17:32:02,723 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:dag_1573233179705_0050_1][Event:TASK_ATTEMPT_STARTED]:
vertexName=Map 1, taskAttemptId=attempt_1573233179705_0050_1_00_000000_0,
startTime=1573493522722, containerId=container_222212222_0050_01_000001,
nodeId=hostname:43033
2019-11-11 17:32:02,823 [INFO] [TaskCommunicator # 0]
|tezplugins.LlapTaskCommunicator|: Failed to run task:
attempt_1573233179705_0050_1_00_000000_0 on containerId:
container_222212222_0050_01_000001
org.apache.hadoop.ipc.RemoteException(java.lang.ClassCastException):
org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2
cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.BlockingService
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:510)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
at org.apache.hadoop.ipc.Client.call(Client.java:1491)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy50.submitWork(Unknown Source)
at
org.apache.hadoop.hive.llap.impl.LlapProtocolClientImpl.submitWork(LlapProtocolClientImpl.java:81)
at
org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy$SubmitWorkCallable.call(LlapProtocolClientProxy.java:99)
at
org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy$SubmitWorkCallable.call(LlapProtocolClientProxy.java:89)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-11-11 17:32:02,828 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:dag_1573233179705_0050_1][Event:TASK_ATTEMPT_FINISHED]:
vertexName=Map 1, taskAttemptId=attempt_1573233179705_0050_1_00_000000_0,
creationTime=1573493522614, allocationTime=1573493522672,
startTime=1573493522722, finishTime=1573493522826, timeTaken=104,
status=FAILED, taskFailureType=NON_FATAL, errorEnum=UNKNOWN_ERROR,
diagnostics=org.apache.hadoop.ipc.RemoteException(java.lang.ClassCastException):
org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2
cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.BlockingService
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:510)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
, nodeHttpAddress=http://hostname:15002, counters=Counters: 1,
org.apache.tez.common.counters.DAGCounter, DATA_LOCAL_TASKS=1
2019-11-11 17:32:02,832 [INFO] [Dispatcher thread {Central}] |impl.TaskImpl|:
Scheduling new attempt for task: task_1573233179705_0050_1_00_000000,
currentFailedAttempts: 1, maxFailedAttempts: 4
...
---------------------------------------------------
After which it fails on the 4th attempt. Is this a jar version mismatch or
protobuffers mismatch or classpath error or...? Let me know what other
information I should provide. Any help is much appreciated!
Software versions are:
Hadoop 3.2.1
Tez 0.9.2
Hive 3.1.2