Yes, we also think that that's what is happening. When there is 'Vertex
re-running', we see Exceptions like:
2016-05-13 09:45:28,280 WARN [ResponseProcessor for block
BP-358076261-10.1.91.4-1462195174702:blk_1075135562_1396252]
hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block
BP-358076261-10.1.91.4-14621951747\
02:blk_1075135562_1396252
java.net.SocketTimeoutException: 65000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.1.91.11:37025 remote=/
10.1.91.11:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
2016-05-13 09:45:28,283 INFO [TezChild] task.TezTaskRunner: Encounted an
error while executing task: attempt_1462251281999_0859_1_01_000063_0
java.io.IOException: All datanodes 10.1.91.11:50010 are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1206)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
However, this still does not explain that Tez 0.7 somehow runs much slower
than Tez 0.6.2 when there arises no such Exception. As an example, see the
summary of results below. Tez 0.7 takes more than 1100 seconds to sort
40GB per node (where there is no Vertex re-running and no Exception other
than at the end of the run), whereas Tez 0.6.2 spends just less than 800
seconds. These results were obtained from the same cluster, using the
default settings by HDP 2.4 (installed with Ambari), with very few
differences in configuration settings.
I believe that we missed some important settings for Tez 0.7 (because we
did not install Hadoop and HDFS manually). Let me try the same experiment
in other clusters and see what happens.
Thanks for the response,
--- Sungwoo
tez-0.6.2-40GB_10nodes
id time num_containers mem core
diag
0 761 239 525320249 82608
1 767 239 527612323 83271
2 736 239 520229980 82317
tez-0.7.0-40GB_10nodes
id time num_containers mem core
diag
0 1405 339 953706595 136624
Vertex re-running, vertexName=initialmap,
vertexId=vertex_1462251281999_0859_1_00
1 1157 239 828765079 118293
2 1219 239 833052604 118151
On Mon, Jun 13, 2016 at 6:21 AM, Bikas Saha <[email protected]> wrote:
> From vertex re-running messages it seems like you may be seeing errors in
> fetching shuffle data to reducers from mappers. This results in re-running
> map vertex tasks to produce only the missing data – with the corresponding
> vertex re-running message.
>
>
>
> Bikas
>
>
>
> *From:* Rajesh Balamohan [mailto:[email protected]]
> *Sent:* Saturday, June 11, 2016 5:14 AM
> *To:* [email protected]
> *Subject:* Re: Question on Tez 0.6 and Tez 0.7
>
>
>
> Can you share the exceptions that happened in tez 0.7 runs.
>
> -Rajesh.B
>
> On 11-Jun-2016 16:28, "Sungwoo Park" <[email protected]> wrote:
>
> Hello,
>
>
>
> I have a question about the performance difference between Tez 0.6.2 and
> Tez 0.7.0.
>
>
>
> This is what we did:
>
>
>
> 1. Installed HDP 2.4 on a 10-node cluster with default settings. No other
> particular changes were made to the
>
> default settings recommended by HDP 2.4.
>
>
>
> 2. Ran TeraSort using Tez 0.6.2 and Tez 0.7.0, and compared the running
> time.
>
>
>
> Each experiment specifies the amount of input data per node. For example,
> 10GB_per_node means a total of
>
> 100GB input because there are 10 data nodes in the cluster.
>
>
>
> We've found that Tez 0.7.0 runs consistently slower than Tez 0.6.2,
> producing 'Vertex re-running' errors quite
>
> often when the size of input data per node is over 40GB. Even when there
> is no 'Vertex re-running', Tez 0.7.0
>
> took much longer than Tez 0.6.2.
>
>
>
> We know that Tez 0.7.0 runs faster than Tez 0.6.2, because on a cluster of
> 44 nodes (with only 24GB memory per
>
> node), Tez 0.7.0 finished TeraSort almost as fast as Tez 0.6.2. We are
> trying to figure out what we missed in
>
> the experiments on the 11-node cluster.
>
>
>
> Any help here would be appreciated. Thanks a lot.
>
>
>
> Sungwoo Park
>
>
>
> ----- Configuration
>
>
>
> HDP 2.4
>
> 11 nodes, 10 data nodes, each with 96GB memory, 6 x 500GB HDDs
>
> same HDFS, Yarn, MR
>
>
>
> Each mapper container uses 5GB.
>
> Each reducer container uses 10GB.
>
>
>
> Configurations specific to tez-0.6.0
>
> tez.runtime.sort.threads = 2
>
>
>
> Configurations specicfic to tez-0.7.0
>
> tez.grouping.max-size = 1073741824
>
> tez.runtime.sorter.class = PIPELINED
>
> tez.runtime.pipelined.sorter.sort.threads = 2
>
>
>
> ----- TEZ-0.6.2
>
>
>
> 10GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 212 239 144695261 21873
>
> 1 204 239 139582665 20945
>
> 2 211 239 143477178 21700
>
>
>
> 20GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 392 239 272528515 42367
>
> 1 402 239 273085026 42469
>
> 2 410 239 270118502 42111
>
>
>
> 40GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 761 239 525320249 82608
>
> 1 767 239 527612323 83271
>
> 2 736 239 520229980 82317
>
>
>
> 80GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 1564 239 1123903845 173915
>
> 1 1666 239 1161079968 178656
>
> 2 1628 239 1146656912 175998
>
>
>
> 160GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 3689 239 2523160230 377563
>
> 1 3796 240 2610411363 388928
>
> 2 3624 239 2546652697 381400
>
>
>
> ----- TEZ-0.7.0
>
>
>
> 10GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 262 239 179373935 26223
>
> 1 259 239 179375665 25767
>
> 2 271 239 186946086 26516
>
>
>
> 20GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 572 239 380034060 55515
>
> 1 533 239 364082337 53555
>
> 2 515 239 356570788 52762
>
>
>
> 40GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 1405 339 953706595 136624
> Vertex re-running
>
> 1 1157 239 828765079 118293
>
> 2 1219 239 833052604 118151
>
>
>
> 80GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 3046 361 1999047193 279635
> Vertex re-running
>
> 1 2967 337 2079807505 290171
> Vertex re-running
>
> 2 3138 355 2030176406 282875
> Vertex re-running
>
>
>
> 160GB_per_node
>
> id time num_containers mem core
> diag
>
> 0 6832 436 4524472859 634518
> Vertex re-running
>
> 1 6233 365 4123693672 573259
> Vertex re-running
>
> 2 6133 379 4121812899 579044
> Vertex re-running
>
>