Can you share the exceptions that happened in tez 0.7 runs.

-Rajesh.B
On 11-Jun-2016 16:28, "Sungwoo Park" <[email protected]> wrote:

> Hello,
>
> I have a question about the performance difference between Tez 0.6.2 and
> Tez 0.7.0.
>
> This is what we did:
>
> 1. Installed HDP 2.4 on a 10-node cluster with default settings. No other
> particular changes were made to the
> default settings recommended by HDP 2.4.
>
> 2. Ran TeraSort using Tez 0.6.2 and Tez 0.7.0, and compared the running
> time.
>
> Each experiment specifies the amount of input data per node. For example,
> 10GB_per_node means a total of
> 100GB input because there are 10 data nodes in the cluster.
>
> We've found that Tez 0.7.0 runs consistently slower than Tez 0.6.2,
> producing 'Vertex re-running' errors quite
> often when the size of input data per node is over 40GB. Even when there
> is no 'Vertex re-running', Tez 0.7.0
> took much longer than Tez 0.6.2.
>
> We know that Tez 0.7.0 runs faster than Tez 0.6.2, because on a cluster of
> 44 nodes (with only 24GB memory per
> node), Tez 0.7.0 finished TeraSort almost as fast as Tez 0.6.2. We are
> trying to figure out what we missed in
> the experiments on the 11-node cluster.
>
> Any help here would be appreciated. Thanks a lot.
>
> Sungwoo Park
>
> ----- Configuration
>
> HDP 2.4
> 11 nodes, 10 data nodes, each with 96GB memory, 6 x 500GB HDDs
> same HDFS, Yarn, MR
>
> Each mapper container uses 5GB.
> Each reducer container uses 10GB.
>
> Configurations specific to tez-0.6.0
> tez.runtime.sort.threads = 2
>
> Configurations specicfic to tez-0.7.0
> tez.grouping.max-size = 1073741824
> tez.runtime.sorter.class = PIPELINED
> tez.runtime.pipelined.sorter.sort.threads = 2
>
> ----- TEZ-0.6.2
>
> 10GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               212             239             144695261       21873
> 1               204             239             139582665       20945
> 2               211             239             143477178       21700
>
> 20GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               392             239             272528515       42367
> 1               402             239             273085026       42469
> 2               410             239             270118502       42111
>
> 40GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               761             239             525320249       82608
> 1               767             239             527612323       83271
> 2               736             239             520229980       82317
>
> 80GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               1564            239             1123903845      173915
> 1               1666            239             1161079968      178656
> 2               1628            239             1146656912      175998
>
> 160GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               3689            239             2523160230      377563
> 1               3796            240             2610411363      388928
> 2               3624            239             2546652697      381400
>
> ----- TEZ-0.7.0
>
> 10GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               262             239             179373935       26223
> 1               259             239             179375665       25767
> 2               271             239             186946086       26516
>
> 20GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               572             239             380034060       55515
> 1               533             239             364082337       53555
> 2               515             239             356570788       52762
>
> 40GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               1405            339             953706595       136624
>      Vertex re-running
> 1               1157            239             828765079       118293
> 2               1219            239             833052604       118151
>
> 80GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               3046            361             1999047193      279635
>      Vertex re-running
> 1               2967            337             2079807505      290171
>        Vertex re-running
> 2               3138            355             2030176406      282875
>        Vertex re-running
>
> 160GB_per_node
> id              time            num_containers  mem             core
>      diag
> 0               6832            436             4524472859      634518
>        Vertex re-running
> 1               6233            365             4123693672      573259
>      Vertex re-running
> 2               6133            379             4121812899      579044
>      Vertex re-running
>

Reply via email to