What we have found is that mapred-site.xml affects the configuration for
Tez 0.6 and 0.7 in a different way.
Here is what we tried:
1. We installed HDP 2.4.2.0-258.
2. On the directory /usr/hdp/2.4.2.0-258/hadoop/etc/hadoop, there is
mapred-site.xml. The default value for mapreduce.task.io.sort.mb is 2047.
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>2047</value>
</property>
3. In our local installation of Tez, there is tez-site.xml. The default
value for tez.runtime.io.sort.mb is 1843.
<property>
<name>tez.runtime.io.sort.mb</name>
<value>1843</value>
</property>
4. When we run TeraSort from hadoop-mapreduce 2.7.1, Tez 0.6 uses the value
of 1843 from tez-site.xml for tez.runtime.io.sort.mb for the user payload
in ProcessorDescriptor. However, Tez 0.7 somehow uses 2047 from
mapred-site.xml. For example, setting mapreduce.task.io.sort.mb to 3000
changes tez.runtime.io.sort.mb to 3000 in Tez 0.7, whereas it has no effect
on Tez 0.6.
I guess some key-value pairs from mapred-site.xml interfere with Tez 0.7
(but not Tez 0.6). This is because setting tez.runtime.io.sort.mb is 1843
does not make any significant difference in execution time for Tez 0.7 (it
still runs slow). In conclusion, I think that Tez 0.7 runs fast, but there
might some bug in Tez 0.7 client for handling MapReduce jobs such as
TeraSort.
Let me try to run TeraSort on Tez 0.7 without using mapred-site.xml.
(Currently it fails with a memory error.)
--- Sungwoo
On Mon, Jun 13, 2016 at 12:56 PM, Rajesh Balamohan <
[email protected]> wrote:
> Are the runtime configs identical between 0.6 and 0.7?. E.g Do you see any
> changes in tez.runtime.io.sort.mb between 0.6 and 0.7?
>
> On Mon, Jun 13, 2016 at 9:20 AM, Sungwoo Park <[email protected]> wrote:
>
>> Yes, we also think that that's what is happening. When there is 'Vertex
>> re-running', we see Exceptions like:
>>
>> 2016-05-13 09:45:28,280 WARN [ResponseProcessor for block
>> BP-358076261-10.1.91.4-1462195174702:blk_1075135562_1396252]
>> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block
>> BP-358076261-10.1.91.4-14621951747\
>> 02:blk_1075135562_1396252
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10.1.91.11:37025
>> remote=/10.1.91.11:50010]
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>> at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>> at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>> at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
>> at java.io.FilterInputStream.read(FilterInputStream.java:83)
>> at java.io.FilterInputStream.read(FilterInputStream.java:83)
>> at
>> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
>> 2016-05-13 09:45:28,283 INFO [TezChild] task.TezTaskRunner: Encounted an
>> error while executing task: attempt_1462251281999_0859_1_01_000063_0
>> java.io.IOException: All datanodes 10.1.91.11:50010 are bad. Aborting...
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1206)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
>>
>> However, this still does not explain that Tez 0.7 somehow runs much
>> slower than Tez 0.6.2 when there arises no such Exception. As an example,
>> see the summary of results below. Tez 0.7 takes more than 1100 seconds to
>> sort 40GB per node (where there is no Vertex re-running and no Exception
>> other than at the end of the run), whereas Tez 0.6.2 spends just less than
>> 800 seconds. These results were obtained from the same cluster, using the
>> default settings by HDP 2.4 (installed with Ambari), with very few
>> differences in configuration settings.
>>
>> I believe that we missed some important settings for Tez 0.7 (because we
>> did not install Hadoop and HDFS manually). Let me try the same experiment
>> in other clusters and see what happens.
>>
>> Thanks for the response,
>>
>> --- Sungwoo
>>
>> tez-0.6.2-40GB_10nodes
>> id time num_containers mem core
>> diag
>> 0 761 239 525320249 82608
>>
>> 1 767 239 527612323 83271
>>
>> 2 736 239 520229980 82317
>>
>>
>>
>> tez-0.7.0-40GB_10nodes
>>
>> id time num_containers mem core
>> diag
>> 0 1405 339 953706595 136624
>> Vertex re-running, vertexName=initialmap,
>> vertexId=vertex_1462251281999_0859_1_00
>> 1 1157 239 828765079 118293
>>
>> 2 1219 239 833052604 118151
>>
>>
>>
>> On Mon, Jun 13, 2016 at 6:21 AM, Bikas Saha <[email protected]> wrote:
>>
>>> From vertex re-running messages it seems like you may be seeing errors
>>> in fetching shuffle data to reducers from mappers. This results in
>>> re-running map vertex tasks to produce only the missing data – with the
>>> corresponding vertex re-running message.
>>>
>>>
>>>
>>> Bikas
>>>
>>>
>>>
>>> *From:* Rajesh Balamohan [mailto:[email protected]]
>>> *Sent:* Saturday, June 11, 2016 5:14 AM
>>> *To:* [email protected]
>>> *Subject:* Re: Question on Tez 0.6 and Tez 0.7
>>>
>>>
>>>
>>> Can you share the exceptions that happened in tez 0.7 runs.
>>>
>>> -Rajesh.B
>>>
>>> On 11-Jun-2016 16:28, "Sungwoo Park" <[email protected]> wrote:
>>>
>>> Hello,
>>>
>>>
>>>
>>> I have a question about the performance difference between Tez 0.6.2 and
>>> Tez 0.7.0.
>>>
>>>
>>>
>>> This is what we did:
>>>
>>>
>>>
>>> 1. Installed HDP 2.4 on a 10-node cluster with default settings. No
>>> other particular changes were made to the
>>>
>>> default settings recommended by HDP 2.4.
>>>
>>>
>>>
>>> 2. Ran TeraSort using Tez 0.6.2 and Tez 0.7.0, and compared the running
>>> time.
>>>
>>>
>>>
>>> Each experiment specifies the amount of input data per node. For
>>> example, 10GB_per_node means a total of
>>>
>>> 100GB input because there are 10 data nodes in the cluster.
>>>
>>>
>>>
>>> We've found that Tez 0.7.0 runs consistently slower than Tez 0.6.2,
>>> producing 'Vertex re-running' errors quite
>>>
>>> often when the size of input data per node is over 40GB. Even when there
>>> is no 'Vertex re-running', Tez 0.7.0
>>>
>>> took much longer than Tez 0.6.2.
>>>
>>>
>>>
>>> We know that Tez 0.7.0 runs faster than Tez 0.6.2, because on a cluster
>>> of 44 nodes (with only 24GB memory per
>>>
>>> node), Tez 0.7.0 finished TeraSort almost as fast as Tez 0.6.2. We are
>>> trying to figure out what we missed in
>>>
>>> the experiments on the 11-node cluster.
>>>
>>>
>>>
>>> Any help here would be appreciated. Thanks a lot.
>>>
>>>
>>>
>>> Sungwoo Park
>>>
>>>
>>>
>>> ----- Configuration
>>>
>>>
>>>
>>> HDP 2.4
>>>
>>> 11 nodes, 10 data nodes, each with 96GB memory, 6 x 500GB HDDs
>>>
>>> same HDFS, Yarn, MR
>>>
>>>
>>>
>>> Each mapper container uses 5GB.
>>>
>>> Each reducer container uses 10GB.
>>>
>>>
>>>
>>> Configurations specific to tez-0.6.0
>>>
>>> tez.runtime.sort.threads = 2
>>>
>>>
>>>
>>> Configurations specicfic to tez-0.7.0
>>>
>>> tez.grouping.max-size = 1073741824
>>>
>>> tez.runtime.sorter.class = PIPELINED
>>>
>>> tez.runtime.pipelined.sorter.sort.threads = 2
>>>
>>>
>>>
>>> ----- TEZ-0.6.2
>>>
>>>
>>>
>>> 10GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 212 239 144695261 21873
>>>
>>> 1 204 239 139582665 20945
>>>
>>> 2 211 239 143477178 21700
>>>
>>>
>>>
>>> 20GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 392 239 272528515 42367
>>>
>>> 1 402 239 273085026 42469
>>>
>>> 2 410 239 270118502 42111
>>>
>>>
>>>
>>> 40GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 761 239 525320249 82608
>>>
>>> 1 767 239 527612323 83271
>>>
>>> 2 736 239 520229980 82317
>>>
>>>
>>>
>>> 80GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 1564 239 1123903845 173915
>>>
>>> 1 1666 239 1161079968 178656
>>>
>>> 2 1628 239 1146656912 175998
>>>
>>>
>>>
>>> 160GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 3689 239 2523160230 377563
>>>
>>> 1 3796 240 2610411363 388928
>>>
>>> 2 3624 239 2546652697 381400
>>>
>>>
>>>
>>> ----- TEZ-0.7.0
>>>
>>>
>>>
>>> 10GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 262 239 179373935 26223
>>>
>>> 1 259 239 179375665 25767
>>>
>>> 2 271 239 186946086 26516
>>>
>>>
>>>
>>> 20GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 572 239 380034060 55515
>>>
>>> 1 533 239 364082337 53555
>>>
>>> 2 515 239 356570788 52762
>>>
>>>
>>>
>>> 40GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 1405 339 953706595 136624
>>> Vertex re-running
>>>
>>> 1 1157 239 828765079 118293
>>>
>>> 2 1219 239 833052604 118151
>>>
>>>
>>>
>>> 80GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 3046 361 1999047193 279635
>>> Vertex re-running
>>>
>>> 1 2967 337 2079807505 290171
>>> Vertex re-running
>>>
>>> 2 3138 355 2030176406 282875
>>> Vertex re-running
>>>
>>>
>>>
>>> 160GB_per_node
>>>
>>> id time num_containers mem core
>>> diag
>>>
>>> 0 6832 436 4524472859 634518
>>> Vertex re-running
>>>
>>> 1 6233 365 4123693672 573259
>>> Vertex re-running
>>>
>>> 2 6133 379 4121812899 579044
>>> Vertex re-running
>>>
>>>
>>
>
>
> --
> ~Rajesh.B
>