Thanks guys.
As far as I can see, the value of "mapreduce.reduce.cpu.vcores" is set to
1, however, the logs always seem to print out two statements like this:
7831 [Initializer 2] INFO
org.apache.tez.runtime.common.resources.MemoryDistributor$RequestorInfo -
Received request: 104857600, type: OUTPUT, componentVertexName: Reducer 14
7831 [Initializer 2] INFO
org.apache.tez.runtime.common.resources.MemoryDistributor$RequestorInfo -
Received request: 104857600, type: OUTPUT, componentVertexName: Reducer 14
...
7832 [Initializer 2] INFO
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask - Initialized Output
with dest edge: Reducer 14
7832 [Initializer 2] INFO
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask - Initialized Output
with dest edge: Reducer 14
Is this an indication of two vcores being used for the reducers or is there
some other reason why the log contains always prints these things in pairs?
>From the logs I can see that the last log events related to a reducer that
then seems to hang are as follows:
...
7903 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager - All source
tasks assigned. Ramping up 1 remaining tasks for vertex: Reducer 14
7903 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager - All source
tasks assigned. Ramping up 1 remaining tasks for vertex: Reducer 14
7904 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.VertexImpl - Routing pending task events
for vertex: vertex_1433328560315_0001_4_17 [Reducer 14]
7904 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.VertexImpl - Routing pending task events
for vertex: vertex_1433328560315_0001_4_17 [Reducer 14]
7904 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.DAGImpl - Vertex
vertex_1433328560315_0001_4_13 [Reducer 16] completed.,
numCompletedVertices=19, numSuccessfulVertices=19, numFailedVertices=0,
numKilledVertices=0, numVertices=25
7904 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.DAGImpl - Vertex
vertex_1433328560315_0001_4_13 [Reducer 16] completed.,
numCompletedVertices=19, numSuccessfulVertices=19, numFailedVertices=0,
numKilledVertices=0, numVertices=25
7904 [LocalContainerLauncher-SubTaskRunner] INFO
org.apache.tez.dag.app.launcher.LocalContainerLauncher - Interrupting
running/queued container with id: container_1433328560315_0001_00_000010
7904 [LocalContainerLauncher-SubTaskRunner] INFO
org.apache.tez.dag.app.launcher.LocalContainerLauncher - Interrupting
running/queued container with id: container_1433328560315_0001_00_000010
7904 [CallbackExecutor] INFO
org.apache.tez.dag.app.launcher.LocalContainerLauncher - Container:
container_1433328560315_0001_00_000010 completed successfully
7904 [CallbackExecutor] INFO
org.apache.tez.dag.app.launcher.LocalContainerLauncher - Container:
container_1433328560315_0001_00_000010 completed successfully
7905 [HistoryEventHandlingThread] INFO
org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService -
Writing event TASK_STARTED to history file
7905 [HistoryEventHandlingThread] INFO
org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService -
Writing event TASK_STARTED to history file
7906 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.history.HistoryEventHandler -
[HISTORY][DAG:dag_1433328560315_0001_4][Event:TASK_STARTED]:
vertexName=Reducer 14, taskId=task_1433328560315_0001_4_17_000000,
scheduledTime=1433328568212, launchTime=1433328568212
7906 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.history.HistoryEventHandler -
[HISTORY][DAG:dag_1433328560315_0001_4][Event:TASK_STARTED]:
vertexName=Reducer 14, taskId=task_1433328560315_0001_4_17_000000,
scheduledTime=1433328568212, launchTime=1433328568212
7906 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.TaskImpl -
task_1433328560315_0001_4_17_000000 Task Transitioned from NEW to SCHEDULED
due to event T_SCHEDULE
7906 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.TaskImpl -
task_1433328560315_0001_4_17_000000 Task Transitioned from NEW to SCHEDULED
due to event T_SCHEDULE
7906 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.rm.container.AMContainerImpl - AMContainer
container_1433328560315_0001_00_000010 transitioned from STOP_REQUESTED to
STOPPING via event C_NM_STOP_SENT
7906 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.rm.container.AMContainerImpl - AMContainer
container_1433328560315_0001_00_000010 transitioned from STOP_REQUESTED to
STOPPING via event C_NM_STOP_SENT
7907 [LocalContainerLauncher-SubTaskRunner] INFO
org.apache.tez.runtime.task.TezChild - PID, containerIdentifier: null,
container_1433328560315_0001_00_000011
7907 [LocalContainerLauncher-SubTaskRunner] INFO
org.apache.tez.runtime.task.TezChild - PID, containerIdentifier: null,
container_1433328560315_0001_00_000011
7907 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.rm.container.AMContainerImpl - AMContainer
container_1433328560315_0001_00_000010 transitioned from STOPPING to
COMPLETED via event C_COMPLETED
7907 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.rm.container.AMContainerImpl - AMContainer
container_1433328560315_0001_00_000010 transitioned from STOPPING to
COMPLETED via event C_COMPLETED
7908 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl - remoteTaskSpec:DAGName :
thomas.larsson_20150603124949_c56dca7e-2f55-4e33-a552-dd9a52fbc9d5:4,
VertexName: Reducer 14, VertexParallelism: 1,
TaskAttemptID:attempt_1433328560315_0001_4_17_000000_0,
processorName=org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor,
inputSpecListSize=2, outputSpecListSize=2, inputSpecList=[{{
sourceVertexName=Reducer 16, physicalEdgeCount=1,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput
}}, {{ sourceVertexName=Map 13, physicalEdgeCount=0,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput
}}, ], outputSpecList=[{{ destinationVertexName=out_Reducer 14,
physicalEdgeCount=0,
outputClassName=org.apache.tez.mapreduce.output.MROutput }}, {{
destinationVertexName=Reducer 10, physicalEdgeCount=1,
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
}}, ]
7908 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.TaskAttemptImpl - remoteTaskSpec:DAGName :
thomas.larsson_20150603124949_c56dca7e-2f55-4e33-a552-dd9a52fbc9d5:4,
VertexName: Reducer 14, VertexParallelism: 1,
TaskAttemptID:attempt_1433328560315_0001_4_17_000000_0,
processorName=org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor,
inputSpecListSize=2, outputSpecListSize=2, inputSpecList=[{{
sourceVertexName=Reducer 16, physicalEdgeCount=1,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput
}}, {{ sourceVertexName=Map 13, physicalEdgeCount=0,
inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput
}}, ], outputSpecList=[{{ destinationVertexName=out_Reducer 14,
physicalEdgeCount=0,
outputClassName=org.apache.tez.mapreduce.output.MROutput }}, {{
destinationVertexName=Reducer 10, physicalEdgeCount=1,
outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput
}}, ]
...
I have retrieved the dot-graphs of the execution graph and can see that
Reducer 14 has as its input data from Map_13 and Reducer_16.
When looking at the logs I can find lines that indicate that both of these
vertices have completed, i.e.
7115 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.DAGImpl - Vertex
vertex_1433328560315_0001_4_16 [Map 13] completed., numCompletedVertices=1,
numSuccessfulVertices=1, numFailedVertices=0, numKilledVertices=0,
numVertices=25
...
7904 [AsyncDispatcher event handler] INFO
org.apache.tez.dag.app.dag.impl.DAGImpl - Vertex
vertex_1433328560315_0001_4_13 [Reducer 16] completed.,
numCompletedVertices=19, numSuccessfulVertices=19, numFailedVertices=0,
numKilledVertices=0, numVertices=25
Any more tips on where I can look to find out why this Reducer is not
progressing?
/Thomas
On Wed, Jun 3, 2015 at 11:29 AM, [email protected] <[email protected]>
wrote:
> Maybe meet https://issues.apache.org/jira/browse/TEZ-704
>
> ------------------------------
> [email protected]
>
>
> *From:* [email protected]
> *Date:* 2015-06-03 17:26
> *To:* user <[email protected]>
> *Subject:* Re: How to debug hanging hive on tez job
> What is the value of mapreduce.reduce.cpu.vcores?
>
> ------------------------------
> [email protected]
>
>
> *From:* Thomas Larsson <[email protected]>
> *Date:* 2015-06-03 16:45
> *To:* user <[email protected]>
> *Subject:* How to debug hanging hive on tez job
> Hello,
>
> I'm not sure if this question belongs to the hive or tez community so bear
> with me.
> I am running some hive queries on hive configured with tez in local mode
> as the execution engine.
> I am using hive version 0.14 and tez version 0.5.2 (HDP 2.2.4.2).
>
> When I run the hive query, the jobs run for a while and then get stuck,
> just printing out this same monitoring line over and over (i.e no changes
> happen for any of the vertices)
>
> Map 1: -/- Map 11: 1/1 Map 12: 1/1 Map 13: -/- Map 15: -/-
> Map 17: 1/1 Map 18: -/- Map 19: 1/1 Map 20: -/- Map 21: -/-
> Map 22: -/- Map 23: -/- Map 24: -/- Map 25: 1/1 Map 26: -/-
> Map 27: -/- Map 7: -/- Reducer 10: 0/1 Reducer 14: 0/1 Reducer
> 16: 1/1 Reducer 3: 0(+1)/1 Reducer 4: 0/1 Reducer 5: 0/1
> Reducer 6: 0/1 Reducer 8: 1/1
>
>
> Do you have any suggestion as to how to find out why it gets stuck?
> The query is fairly complex but the input data tables are either very
> small or even empty.
> There is so much in the logs that I don't know what to look for.
> Taking a thread dump when this happens is a bit like looking in a haystack
> for me.
>
> Anyone?
> /Thomas
>
>
--
*Thomas Larsson*
Senior developer
Odin - Backoffice core
Klarna AB
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00
Reg no: 556737-0431
klarna.com