Re: Scalding+Cascading+TEZ = ♥ [FOLLOW UP #2]

Cyrille Chépélov Tue, 12 May 2015 11:02:34 -0700

/(Xposted to //scalding-dev@ and user@tez, reply-to set to cascading-user@)/


Hello,

*TL;DR: *things look great, we like cascading-3.0.0-wip-115 to the point
we can run some production jobs *on tez* if we're in a hurry.

some progress update:

  * As of cascading-3.0.0-wip*-115*, we no longer see a difference in
    output data, whether running with -hadoop, -hadoop2-mr1, or of
    course -hadoop2-tez
      o /(lots of hard work was involved, this is mostly Sylvain on the
        reporting side and Chris on the fixing side)
        /
  * We run all three back-ends on our test rig, at least one a week,
    aiming for daily.
      o /not doing this for a while let us fail to notice a regression
        in -hadoop. It's been now reported/ and /fixed/

Remaining on our (Transparency) to-do list:

  * Test again against vanilla tez-0.6.0, but also against
    tez-0.6.1-SNAPSHOT
      o /in particular, looking to see whether vanilla 0.6.0 still
        freezes (I expect it should) and whether 0.6.1-SNAPSHOT passes
        without too much trouble with guava version mismatches/
  * Evaluate whether it's better to run cascades of complex flows under
    TEZ with |cascading.cascade.maxconcurrentflows=1 |rather than the
    default (no limit), as it's possible that when multiple IO-hungry
    jobs are running at the same time, thrashing happens and reduces
    performance
  * Plug hbase taps
  * try compiling scalding 0.13.1 against cascading 3.0.0-wip-115(+) and
    see what happens with the test suite (under -hadoop)

For now, the recipe is still as in the original report (patched
tez-0.6.0, patched scalding-0.13.1, cascading-3.0.0-wip) except a newer
cascading wip.

    -- Cyrille


Le 16/04/2015 19:10, Sylvain Veyrié a écrit :
> /also cross-posted on //cascading-user@ and user@tez/
>
> Hello all,
>
> Following Cyrille announcement, I started applying our regression test
> suite on the same code base. This regression test launches our code on
> a reduced dataset (14k rows), and checks results are unchanged. As of
> today, the result is the same with 1) Cascading local 2) Hadoop MR1
> ("--hdfs") 3) Hadoop 2 Yarn. So now it runs also on Tez.
>
> Good news: most of the result is exactly the same, and the run is a
> lot faaaaaaster -- more than 2h => less than 10 minutes
> Bad news: there are regressions (~3% of output data) - we identified
> at least one.
>
> [BAD]
> val result = input
>   .map { blah => blah.bleh }
>   .collect { case Some(item) => item }
>   .groupBy(bleh => bleh.id)
>   .sortBy(bleh => (bleh.foo, bleh.bar)).reverse
>   .take(1)
>   .values
> [/BAD]
>
> It appears that, _only while executing with Tez_, the "take(1)"
> function is just ignored, giving output with elements that should have
> been eliminated. Cyrille suspects it might be a planner issue.
>
> This modified code fixed it:
>
> [GOOD]
> val result = input
>   .map { blah => blah.bleh }
>   .collect { case Some(item) => item }
>   .groupBy(bleh => bleh.id)
>   .sortedTake(1)(Ordering.by[Bleh, (Double, String)](bleh  => (bleh.foo, 
> bleh.bar)).reverse)
>   .toTypedPipe.flatMap(xx => xx._2)
> [/GOOD]
>
> We know this new code is better (maybe from a performance point of
> view), but we still have some take() and head() in the code base, so
> we have some invalid output. However, it is still a bug, and I prefer
> not to start rewriting all occurrences just to check it fixes it.
>
> I wanted to post this with a Java/Cascading test case, but I am unable
> to reproduce it on a simple test case yet, even with Scalding.
>
> On the test runs, AFAIK, the debug flow gives exactly the same thing
> in both cases - however, in the logs:
> * BAD : "Tuples_Read=3737, Tuples_Written=3737" <= wat?
> * GOOD : "Tuples_Read=13818, Tuples_Written=4029"
>
> I did not dig why Tuples_Read is different in each case (maybe the
> same thing upstream), but it seems obvious there is a problem having
> Tuples_Read and Tuples_Written the same value - this is consistent
> with our output with elements that should have been eliminated.
>
> -- Sylvain Veyrié
>
>

Re: Scalding+Cascading+TEZ = ♥ [FOLLOW UP #2]

Reply via email to