[ 
https://issues.apache.org/jira/browse/HIVE-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19169:
------------------------------------
    Description: 
a more or less recent hive/master with tpcds1000; while running only 1 query at 
a time on the cluster; sometimes a "timeout" happens - this could be even a 
misconfiguration problem...I'm not sure I've set it up correctly.

what I see from: it seems like the attempt have entered the queue and accepted 
- after almost 90 seconds there are messages that it will be pre-empted - and 
the 90 sec timeout happens.

hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
{code}
2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
SessionState (SessionState.java:printError(1214)) - Vertex failed, 
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed
2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error, 
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed, 
taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 killed, 
TaskAttempt 1 failed, info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 
Timed out after 90 secs], TaskAttempt 2 killed, TaskAttempt 3 killed, 
TaskAttempt 4 killed, TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 
killed, TaskAttempt 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, 
TaskAttempt 11 killed, TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed

{code}

tezam logs for the attempt
{code}
2018-04-10 13:18:45,430 [INFO] [TaskSchedulerEventHandlerThread] 
|tezplugins.LlapTaskSchedulerService|: Received allocateRequest. 
task=attempt_1522319554594_0065_19_05_000119_14, priority=194, 
capability=<memory:3072, vCores:1>, hosts=null
2018-04-10 13:18:45,440 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: Assigned #67763, 
task=TaskInfo{task=attempt_1522319554594_0065_19_05_000119_14, priority=194, 
startTime=0, containerId=null, uniqueId=69780, localityDelayTimeout=1127144039} 
on node={lemon-9.openstacklocal:50155, id=2a3562c1-1094-499f-9d83-5c3b5cf34663, 
canAcceptTask=true, st=10, ac=1, commF=false, disabled=false}, to 
container=container_222212222_0065_01_067763
2018-04-10 13:18:45,440 [INFO] [LlapScheduler] |GuaranteedTasks|: Registering 
attempt_1522319554594_0065_19_05_000119_14; false
2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}] 
|impl.TaskAttemptImpl|: TaskAttempt: 
[attempt_1522319554594_0065_19_05_000119_14] submitted. Is using containerId: 
[container_222212222_0065_01_067763] on NM: [lemon-9.openstacklocal:50155]
2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}] 
|HistoryEventHandler.criticalEvents|: 
[HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_STARTED]: 
vertexName=Reducer 3, taskAttemptId=attempt_1522319554594_0065_19_05_000119_14, 
startTime=1523366325446, containerId=container_222212222_0065_01_067763, 
nodeId=lemon-9.openstacklocal:50155
2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9] 
|tezplugins.LlapTaskCommunicator|: Successfully launched task: 
attempt_1522319554594_0065_19_05_000119_14
2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9] |GuaranteedTasks|: 
Received response for attempt_1522319554594_0065_19_05_000119_14, true
2018-04-10 13:19:08,743 [INFO] [IPC Server handler 15 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_05_000128_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
2018-04-10 13:19:18,728 [INFO] [IPC Server handler 11 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
attempt_1522319554594_0065_19_08_000025_7, 
2018-04-10 13:19:28,726 [INFO] [IPC Server handler 28 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
attempt_1522319554594_0065_19_08_000025_7, 
2018-04-10 13:19:38,726 [INFO] [IPC Server handler 15 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
attempt_1522319554594_0065_19_08_000025_7, 
2018-04-10 13:19:48,726 [INFO] [IPC Server handler 17 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
attempt_1522319554594_0065_19_08_000025_7, 
2018-04-10 13:19:58,726 [INFO] [IPC Server handler 5 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
attempt_1522319554594_0065_19_08_000025_7, 
2018-04-10 13:20:08,726 [INFO] [IPC Server handler 2 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
attempt_1522319554594_0065_19_08_000025_7, 
2018-04-10 13:20:18,727 [INFO] [IPC Server handler 17 on 35408] 
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are 
not there: attempt_1522319554594_0065_19_05_000130_14, 
attempt_1522319554594_0065_19_05_000191_14, 
attempt_1522319554594_0065_19_08_000018_6, 
attempt_1522319554594_0065_19_08_000022_6, 
attempt_1522319554594_0065_19_05_000181_14, 
attempt_1522319554594_0065_19_05_000131_15, 
attempt_1522319554594_0065_19_05_000214_15, 
attempt_1522319554594_0065_19_05_000093_11, 
attempt_1522319554594_0065_19_05_000119_14, 
attempt_1522319554594_0065_19_08_000025_7, 
2018-04-10 13:20:24,594 [INFO] [Dispatcher thread {Central}] 
|HistoryEventHandler.criticalEvents|: 
[HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_FINISHED]: 
vertexName=Reducer 3, taskAttemptId=attempt_1522319554594_0065_19_05_000119_14, 
creationTime=1523366325392, allocationTime=1523366325443, 
startTime=1523366325446, finishTime=1523366424594, timeTaken=99148, 
status=FAILED, taskFailureType=NON_FATAL, errorEnum=TASK_HEARTBEAT_ERROR, 
diagnostics=AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out 
after 90 secs, nodeHttpAddress=http://lemon-9.openstacklocal:15002, 
counters=Counters: 0
2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread] 
|GuaranteedTasks|: Unregistering attempt_1522319554594_0065_19_05_000119_14; 
false
2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread] 
|tezplugins.LlapTaskSchedulerService|: Processing de-allocate request for 
task=attempt_1522319554594_0065_19_05_000119_14, state=ASSIGNED, endReason=OTHER
2018-04-10 13:20:24,603 [INFO] [Dispatcher thread {Central}] |node.AMNodeImpl|: 
Attempt failedon node: lemon-9.openstacklocal:50155 TA: 
attempt_1522319554594_0065_19_05_000119_14, container: 
container_222212222_0065_01_067763, numFailedTAs: 57
2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: 
Source task attempt completed for vertex: vertex_1522319554594_0065_19_07 
[Reducer 10] attempt: attempt_1522319554594_0065_19_05_000119_14 with state: 
FAILED vertexState: RUNNING
2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: 
Source task attempt completed for vertex: vertex_1522319554594_0065_19_11 
[Reducer 4] attempt: attempt_1522319554594_0065_19_05_000119_14 with state: 
FAILED vertexState: RUNNING
2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: 
Source task attempt completed for vertex: vertex_1522319554594_0065_19_06 
[Reducer 9] attempt: attempt_1522319554594_0065_19_05_000119_14 with state: 
FAILED vertexState: RUNNING
TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs]
2018-04-10 13:20:25,020 [INFO] [Dispatcher thread {Central}] 
|HistoryEventHandler.criticalEvents|: 
[HISTORY][DAG:dag_1522319554594_0065_19][Event:VERTEX_FINISHED]: 
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
initRequestedTime=1523363710631, initedTime=1523363710645, 
startRequestedTime=1523363713645, startedTime=1523363713646, 
finishTime=1523366425017, timeTaken=2711371, status=FAILED, diagnostics=Task 
failed, taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 
killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed
2018-04-10 13:20:25,157 [INFO] [Dispatcher thread {Central}] 
|HistoryEventHandler.criticalEvents|: 
[HISTORY][DAG:dag_1522319554594_0065_19][Event:DAG_FINISHED]: 
dagId=dag_1522319554594_0065_19, startTime=1523363710305, 
finishTime=1523366425121, timeTaken=2714816, status=FAILED, diagnostics=Vertex 
failed, vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed

{code}

from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ; 
ordered by time:
{code}
java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was not 
registered and couldn't be removed
2018-04-10T13:18:45,455 INFO  [IPC Server handler 3 on 50155 ()] 
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing container 
for execution: fragemendId=attempt_1522319554594_0065_19_05_000119_14, 
am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14, 
user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d, 
appIdString=application_1522319554594_0065, appAttemptNum=1, 
containerIdString=container_222212222_0065_01_067763, dagName=with ws as
2018-04-10T13:18:45,457  INFO [IPC Server handler 3 on 50155 
(1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl: SubmissionState 
for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED 
2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding 
attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to 
execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14, 
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
canFinish=true, canFinish(in queue)=true, isGuaranteed=false, 
firstAttemptStartTime=1523364757720, dagStartTime=1523363710305, 
withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism= 
1286, selfAndUpstreamComplete= 1243}
2018-04-10T13:19:13,534 WARN  [ExecutionCompletionThread #0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received 
onSuccess/onFailure for a fragment for which a completing message was not 
received: attempt_1522319554594_0065_19_05_000119_14
2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed 
notification received: Stacktrace: java.lang.RuntimeException: 
attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be 
removed
{code}

  was:
a more or less recent hive/master with tpcds1000; while running only 1 query at 
a time on the cluster; sometimes a "timeout" happens - this could be even a 
misconfiguration problem...I'm not sure I've set it up correctly.

what I see from: it seems like the attempt have entered the queue and accepted 
- after almost 90 seconds there are messages that it will be pre-empted - and 
the 90 sec timeout happens.

hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
{code}
2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
SessionState (SessionState.java:printError(1214)) - Vertex failed, 
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed
2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error, 
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed, 
taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 killed, 
TaskAttempt 1 failed, info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 
Timed out after 90 secs], TaskAttempt 2 killed, TaskAttempt 3 killed, 
TaskAttempt 4 killed, TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 
killed, TaskAttempt 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, 
TaskAttempt 11 killed, TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8 
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
TaskAttempt 12 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
secs], TaskAttempt 15 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
removed

{code}

from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ; 
ordered by time:
{code}
java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was not 
registered and couldn't be removed
2018-04-10T13:18:45,455 INFO  [IPC Server handler 3 on 50155 ()] 
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing container 
for execution: fragemendId=attempt_1522319554594_0065_19_05_000119_14, 
am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14, 
user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d, 
appIdString=application_1522319554594_0065, appAttemptNum=1, 
containerIdString=container_222212222_0065_01_067763, dagName=with ws as
2018-04-10T13:18:45,457  INFO [IPC Server handler 3 on 50155 
(1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl: SubmissionState 
for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED 
2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding 
attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to 
execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14, 
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
canFinish=true, canFinish(in queue)=true, isGuaranteed=false, 
firstAttemptStartTime=1523364757720, dagStartTime=1523363710305, 
withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism= 
1286, selfAndUpstreamComplete= 1243}
2018-04-10T13:19:13,534 WARN  [ExecutionCompletionThread #0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received 
onSuccess/onFailure for a fragment for which a completing message was not 
received: attempt_1522319554594_0065_19_05_000119_14
2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()] 
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed 
notification received: Stacktrace: java.lang.RuntimeException: 
attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be 
removed
{code}


> llap: Timed out after 90 secs
> -----------------------------
>
>                 Key: HIVE-19169
>                 URL: https://issues.apache.org/jira/browse/HIVE-19169
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Zoltan Haindrich
>            Priority: Major
>
> a more or less recent hive/master with tpcds1000; while running only 1 query 
> at a time on the cluster; sometimes a "timeout" happens - this could be even 
> a misconfiguration problem...I'm not sure I've set it up correctly.
> what I see from: it seems like the attempt have entered the queue and 
> accepted - after almost 90 seconds there are messages that it will be 
> pre-empted - and the 90 sec timeout happens.
> hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
> {code}
> 2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
> SessionState (SessionState.java:printError(1214)) - Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> 2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
> ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error, 
> return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed, 
> taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 
> killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> {code}
> tezam logs for the attempt
> {code}
> 2018-04-10 13:18:45,430 [INFO] [TaskSchedulerEventHandlerThread] 
> |tezplugins.LlapTaskSchedulerService|: Received allocateRequest. 
> task=attempt_1522319554594_0065_19_05_000119_14, priority=194, 
> capability=<memory:3072, vCores:1>, hosts=null
> 2018-04-10 13:18:45,440 [INFO] [LlapScheduler] 
> |tezplugins.LlapTaskSchedulerService|: Assigned #67763, 
> task=TaskInfo{task=attempt_1522319554594_0065_19_05_000119_14, priority=194, 
> startTime=0, containerId=null, uniqueId=69780, 
> localityDelayTimeout=1127144039} on node={lemon-9.openstacklocal:50155, 
> id=2a3562c1-1094-499f-9d83-5c3b5cf34663, canAcceptTask=true, st=10, ac=1, 
> commF=false, disabled=false}, to container=container_222212222_0065_01_067763
> 2018-04-10 13:18:45,440 [INFO] [LlapScheduler] |GuaranteedTasks|: Registering 
> attempt_1522319554594_0065_19_05_000119_14; false
> 2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}] 
> |impl.TaskAttemptImpl|: TaskAttempt: 
> [attempt_1522319554594_0065_19_05_000119_14] submitted. Is using containerId: 
> [container_222212222_0065_01_067763] on NM: [lemon-9.openstacklocal:50155]
> 2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_STARTED]: 
> vertexName=Reducer 3, 
> taskAttemptId=attempt_1522319554594_0065_19_05_000119_14, 
> startTime=1523366325446, containerId=container_222212222_0065_01_067763, 
> nodeId=lemon-9.openstacklocal:50155
> 2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9] 
> |tezplugins.LlapTaskCommunicator|: Successfully launched task: 
> attempt_1522319554594_0065_19_05_000119_14
> 2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9] |GuaranteedTasks|: 
> Received response for attempt_1522319554594_0065_19_05_000119_14, true
> 2018-04-10 13:19:08,743 [INFO] [IPC Server handler 15 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_05_000128_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> 2018-04-10 13:19:18,728 [INFO] [IPC Server handler 11 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> attempt_1522319554594_0065_19_08_000025_7, 
> 2018-04-10 13:19:28,726 [INFO] [IPC Server handler 28 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> attempt_1522319554594_0065_19_08_000025_7, 
> 2018-04-10 13:19:38,726 [INFO] [IPC Server handler 15 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> attempt_1522319554594_0065_19_08_000025_7, 
> 2018-04-10 13:19:48,726 [INFO] [IPC Server handler 17 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> attempt_1522319554594_0065_19_08_000025_7, 
> 2018-04-10 13:19:58,726 [INFO] [IPC Server handler 5 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> attempt_1522319554594_0065_19_08_000025_7, 
> 2018-04-10 13:20:08,726 [INFO] [IPC Server handler 2 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> attempt_1522319554594_0065_19_08_000025_7, 
> 2018-04-10 13:20:18,727 [INFO] [IPC Server handler 17 on 35408] 
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node 
> are not there: attempt_1522319554594_0065_19_05_000130_14, 
> attempt_1522319554594_0065_19_05_000191_14, 
> attempt_1522319554594_0065_19_08_000018_6, 
> attempt_1522319554594_0065_19_08_000022_6, 
> attempt_1522319554594_0065_19_05_000181_14, 
> attempt_1522319554594_0065_19_05_000131_15, 
> attempt_1522319554594_0065_19_05_000214_15, 
> attempt_1522319554594_0065_19_05_000093_11, 
> attempt_1522319554594_0065_19_05_000119_14, 
> attempt_1522319554594_0065_19_08_000025_7, 
> 2018-04-10 13:20:24,594 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Reducer 3, 
> taskAttemptId=attempt_1522319554594_0065_19_05_000119_14, 
> creationTime=1523366325392, allocationTime=1523366325443, 
> startTime=1523366325446, finishTime=1523366424594, timeTaken=99148, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=TASK_HEARTBEAT_ERROR, 
> diagnostics=AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out 
> after 90 secs, nodeHttpAddress=http://lemon-9.openstacklocal:15002, 
> counters=Counters: 0
> 2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread] 
> |GuaranteedTasks|: Unregistering attempt_1522319554594_0065_19_05_000119_14; 
> false
> 2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread] 
> |tezplugins.LlapTaskSchedulerService|: Processing de-allocate request for 
> task=attempt_1522319554594_0065_19_05_000119_14, state=ASSIGNED, 
> endReason=OTHER
> 2018-04-10 13:20:24,603 [INFO] [Dispatcher thread {Central}] 
> |node.AMNodeImpl|: Attempt failedon node: lemon-9.openstacklocal:50155 TA: 
> attempt_1522319554594_0065_19_05_000119_14, container: 
> container_222212222_0065_01_067763, numFailedTAs: 57
> 2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Source task attempt completed for vertex: 
> vertex_1522319554594_0065_19_07 [Reducer 10] attempt: 
> attempt_1522319554594_0065_19_05_000119_14 with state: FAILED vertexState: 
> RUNNING
> 2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Source task attempt completed for vertex: 
> vertex_1522319554594_0065_19_11 [Reducer 4] attempt: 
> attempt_1522319554594_0065_19_05_000119_14 with state: FAILED vertexState: 
> RUNNING
> 2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] 
> |impl.VertexImpl|: Source task attempt completed for vertex: 
> vertex_1522319554594_0065_19_06 [Reducer 9] attempt: 
> attempt_1522319554594_0065_19_05_000119_14 with state: FAILED vertexState: 
> RUNNING
> TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs]
> 2018-04-10 13:20:25,020 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:VERTEX_FINISHED]: 
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
> initRequestedTime=1523363710631, initedTime=1523363710645, 
> startRequestedTime=1523363713645, startedTime=1523363713646, 
> finishTime=1523366425017, timeTaken=2711371, status=FAILED, diagnostics=Task 
> failed, taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 
> 0 killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> 2018-04-10 13:20:25,157 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:DAG_FINISHED]: 
> dagId=dag_1522319554594_0065_19, startTime=1523363710305, 
> finishTime=1523366425121, timeTaken=2714816, status=FAILED, 
> diagnostics=Vertex failed, vertexName=Reducer 3, 
> vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed, 
> taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 
> killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> {code}
> from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ; 
> ordered by time:
> {code}
> java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was 
> not registered and couldn't be removed
> 2018-04-10T13:18:45,455 INFO  [IPC Server handler 3 on 50155 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing 
> container for execution: 
> fragemendId=attempt_1522319554594_0065_19_05_000119_14, 
> am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14, 
> user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d, 
> appIdString=application_1522319554594_0065, appAttemptNum=1, 
> containerIdString=container_222212222_0065_01_067763, dagName=with ws as
> 2018-04-10T13:18:45,457  INFO [IPC Server handler 3 on 50155 
> (1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl: 
> SubmissionState for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED 
> 2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding 
> attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
> 2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to 
> execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14, 
> inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
> canFinish=true, canFinish(in queue)=true, isGuaranteed=false, 
> firstAttemptStartTime=1523364757720, dagStartTime=1523363710305, 
> withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism= 
> 1286, selfAndUpstreamComplete= 1243}
> 2018-04-10T13:19:13,534 WARN  [ExecutionCompletionThread #0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received 
> onSuccess/onFailure for a fragment for which a completing message was not 
> received: attempt_1522319554594_0065_19_05_000119_14
> 2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed 
> notification received: Stacktrace: java.lang.RuntimeException: 
> attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be 
> removed
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to