Re: Understanding resource usage overheads in a Tez job

Piyush Narang Mon, 20 Mar 2017 11:14:35 -0700

Thanks Gopal, we did run into a lot of overhead in the comparator in one of
our other jobs. Turning on OrderedSerialization in Scalding seemed to have
helped there. While we were comparing Hadoop and Tez in that job, we were
seeing Tez's reducers taking substantially more time (and the overhead was
in the comparison methods). I tried a few runs after forcing Cascading to
using the rawBytes comparator - https://github.com/cwensel/
cascading/blob/wip-3.2/cascading-hadoop/src/main/
shared/cascading/tuple/hadoop/util/DeserializerComparator.java#L59
(returning true there), that did help as well. From what I understand,
though, we need to do this from the Scalding side as a lot of our jobs use
complex objects (e.g. thrift structs / scala case classes). If we don't
have ordered serialization enabled from Scalding I'm not sure the raw
comparators will make sense (think some of the work there was to ensure the
byte representations of these objects can be compared sanely).

Thanks for the two links, they look really useful!

I was able to test out a few variants of our job with slowstart = 0.999 to
see the if the pipelining would explain the resource usage. Turns out that
it was contributing a good deal to the resource usage. When we set this
value, we end up seeing's Tez using around 20 (container reuse=false) - 27
(container reuse=true)% lower mb_millis than MR. Runtime wise Tez is still
better, takes around half the time as Hadoop.

Thanks,

On Fri, Mar 17, 2017 at 12:37 PM, Gopal Vijayaraghavan <[email protected]>
wrote:

>
> > We are using OrderedSerialization in a bunch of our jobs. In this job
> we're not using it on both the Hadoop side and the Tez side. The datasets
> both jobs are reading are identical.
>
> That single comparator call was the biggest fraction of slow-down when I
> ran profiles with Tez.
>
> I profiled through that codepath for TEZ-2505, of course YMMV.
>
> My estimate was that a raw byte OrderedSerialization + TezRawComparator
> could save ~50% of the total CPU of some jobs.
>
> > Our suspicion internally was also around pipelining and speculative
> execution across steps which doesn't happen in Hadoop between jobs
>
> https://github.com/apache/tez/blob/master/tez-tools/
> swimlanes/yarn-swimlanes.sh
> +
> https://github.com/apache/tez/blob/master/tez-tools/
> analyzers/job-analyzer/src/main/java/org/apache/tez/analyzer/plugins/
> CriticalPathAnalyzer.java
>
> Those help a lot in locating issues with Tez scheduling and targeting
> optimizations.
>
> Cheers,
> Gopal
>
>
>
>

-- 
- Piyush

Re: Understanding resource usage overheads in a Tez job

Reply via email to