Re: Report: Scalding+Cascading+TEZ = ♥

Chris K Wensel Thu, 09 Apr 2015 10:25:59 -0700

This is great news!!

> On Apr 9, 2015, at 8:29 AM, Cyrille Chépélov <[email protected]> 
> wrote:
> 
> (cross-posted on scalding-dev@, cascading-user@, and user@tez)
> 
> Hi,
> 
> Chris K Wensel wrote not so long ago
> will also add that one user is having some success with Scalding on Cascading 
> 3.0 and Tez
> I'm that guy. It's been a fun ride, and the news is that there are results. 
> While the unboxing experience isn't yet totally pleasant, these results are 
> now very promising.
> 
> The really good part is that apart from build.sbt, we needed no changes to 
> application code to run with the local, hadoop (1.x API on a 2.6.0 cluster), 
> hadoop2-mr1, and hadoop2-tez back-ends.
> 
> Numbers:
> Full dataset: about 116M lines in 6 distinct CSV inputs
> hadoop: about 18 hours (pretty much busy all the time, maxing out either the 
> LAN, disk bandwith OR CPU depending on phases)
> tez: about 8 hours (with no LAN and few disk saturation periods, and apparent 
> room for improvement in CPU/task allocation — confident a couple hours could 
> be shaved).
> Reduced dataset (integration testing dataset): about 2.3M lines in 6 distinct 
> inputs
> hadoop: 112 minutes
> tez: 6.25 minutes 
> In common:
> the job is a cascade made of 20 Flows, which compile into about 420 Cascading 
> steps (Hadoop) or 20 DAG (TEZ)
> about 10K lines of Scala code
> In the small-dataset experiment, hadoop suffers a lot from the zillions of 
> step setup ceremonials it has to perform with YARN, whereas TEZ apps are 
> higher-level and tend to stay much longer from the ResourceManager's point of 
> view.
> 
> Results appear identical so far (still busy comparing and ensuring we've 
> covered all code paths, which we haven't yet, but this looks really good).
> I am grateful for everyone who had the patience to sift through the huge 
> haystacks of logs and graphs I sent, and for the time spent writing patches 
> in the dark for me to test. 
> 
> Chris, I have no idea how much time I tied you up on this, but wow, thanks!
> 
>     -- Cyrille
> 
> 
> How to replicate this:
> We are using a cluster of 7 i7-3770K 
> <http://ark.intel.com/products/65523/Intel-Core-i7-3770K-Processor-8M-Cache-up-to-3_90-GHz>
>  ex-desktop machines, running Debian Jessie + Apache Hadoop 2.6.0 under 
> Ansible. Each machine has 1 system disk, and 4*2TiB of HDFS spindles, + 
> 120GiB of SSD for flush-happy components (Zookeeper, NN, RM etc.). Except for 
> the ATS, all components are in HA mode under Zookeeper, Keepalived and/or 
> HAProxy as appropriate.
> Up to 80% of this cluster is dedicated to the "test" work queue.
> 16 GiB RAM per node (except one which is at 24GiB), we could now justify 
> going to the max at 32GiB.
> We noticed in case there are multiple apps running, that a single TEZ app 
> will tend to "gobble up" all available slots even if multiple RUNNING MR 
> tasks are cohabiting. Not a big deal, and fairly obvious given that 
> TezChildren stay up until preempted out or jobless.
> Scalding 0.13.1 patched with https://github.com/twitter/scalding/pull/1220 
> <https://github.com/twitter/scalding/pull/1220> 
> This replaces --hdfs with --hadoop, --hadoop2-mr1 and --hadoop2-tez which 
> delegate to the appropriate Cascading back-end (--hdfs becomes an alias for 
> --hadoop)
> Cascading 3.0.0-wip-97 
> TEZ rebuilt from the branch-0.6 branch (post-0.6.0) as of commit 
> 282e63af187f59700191579d2328cae3b8d2fa9c + a few other patches (attached here)
> had to patch guava up to version 16.0.1 -- TEZ-2164 would help a lot
> TEZ-2256: reduced logging noise by using a boolean rather than an exception 
> to signal end of buffer in UnorderedPartitonedKVWriter
> TEZ-2237: two patches from Siddarth Seth that helped unstick some of the more 
> complex DAGs
> a few extra settings in the job config:
> "tez.task.resource.memory.mb" -> (1024+512).toString, // default 1024
> "tez.container.max.java.heap.fraction" -> "0.7", // default 0.8
> "tez.queue.name" -> params.jobArgs.getOrElse("queue", "default"),
> "cascading.flow.runtime.gather.partitions.num" -> 
> params.jobArgs.getOrElse("tez-partitions","4"),
> "tez.lib.uris" -> params.jobArgs.getOrElse("tez.lib.uris", 
> "hdfs://cluster/apps/tez-0.6.0/tez-0.6.0.tar.gz"),
> "tez.history.logging.service.class" -> 
> "org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService", 
> "tez.allow.disabled.timeline-domains" -> "true"
> A couple things in the app's build.sbt (sorry for my clumsy use of SBT):
> scalaVersion := "2.11.6"
> 
> val hadoopVersion = "2.6.0"
> 
> val cascadingFabric = sys.props.getOrElse("CASCADING_FABRIC", "hadoop2-tez") 
> // can be "hadoop", "hadoop2-mr1" or "hadoop2-tez"
> 
> val cascadingVersion = "3.0.0-wip-97"
> if (cascadingVersion.endsWith("-dev")) {
>   libraryDependencies ++= Seq(
>     "org.jgrapht" % "jgrapht-core" % "0.9.1",
>     "org.jgrapht" % "jgrapht-ext" % "0.9.1",
>     "riffle" % "riffle" % "1.0.0-wip-7",
>     "org.codehaus.janino" % "janino" % "2.7.6"
>   )
> } else {
>   libraryDependencies ++= Seq()
> }
> 
> val scaldingVersion = {
>   if (cascadingFabric == "hadoop") {
>     "0.13.1"
>   } else {
>     "0.13.1-cch-ffc2"  // this one is the version as patched with PR1220
>   }
> }
> 
> if (cascadingFabric == "hadoop2-tez") {
>   libraryDependencies ++= Seq("javax.xml.bind" % "jaxb-api" % "2.2.2" 
> exclude("javax.xml.stream", "stax-api"),
>     "com.sun.jersey" % "jersey-server" % "1.9" exclude("asm","asm"),
>     "org.sonatype.sisu.inject" % "cglib" % "2.2.1-v20090111" 
> exclude("asm","asm")
>   )
> }
> else {
>   libraryDependencies ++= Seq()
> }
> 
> 
> if (cascadingFabric == "hadoop2-tez") {
>   dependencyOverrides +=  "com.google.guava" % "guava" % "16.0.1"
> 
>   /* as of 0.6.0, Tez depends on a very old version (11.0.2) of guava,
>      which has an incompatibly different API to Stopwatch() than more
>      recent guavas.
> 
>      Version 14.0 of guava introduced the breaking change. */
> 
>   libraryDependencies ++= Seq(
>     "org.apache.tez" % "tez-api" % "0.6.0-SNAPSHOT",
>     "org.apache.tez" % "tez-mapreduce" % "0.6.0-SNAPSHOT",  // FIXME: not 
> sure this is needed
>     "com.google.guava" % "guava" % "16.0.1"
>   ) } else {
>   libraryDependencies ++= Seq()
> }
> 
> libraryDependencies ++= Seq(
>  "com.twitter" %% "scalding-core" % scaldingVersion
>       exclude("cascading", "cascading-core")
>       exclude("cascading", "cascading-hadoop")
>       exclude("cascading", "cascading-local"),
>  "com.twitter" %% "scalding-args" % scaldingVersion
>       exclude("cascading", "cascading-core")
>       exclude("cascading", "cascading-hadoop")
>       exclude("cascading", "cascading-local"),
>  "com.twitter" %% "scalding-date" % scaldingVersion
>       exclude("cascading", "cascading-core")
>       exclude("cascading", "cascading-hadoop")
>       exclude("cascading", "cascading-local"),
>  "com.twitter" %% "scalding-commons" % scaldingVersion
>       exclude("cascading", "cascading-core")
>       exclude("cascading", "cascading-hadoop")
>       exclude("cascading", "cascading-local")
>       exclude("com.hadoop.gplcompression", "hadoop-lzo")
>             // hadoop-lzo also pulled in by elephantbird
>  )
> 
> libraryDependencies ++= Seq(
>   "org.apache.thrift" % "libthrift" % "0.9.1",
> 
>   "cascading" % "cascading-core" % cascadingVersion,
>   "cascading" % ("cascading-" + cascadingFabric) % cascadingVersion,
>     "cascading" % "cascading-local" % cascadingVersion
> 
>   )
> The tez.container.max.java.heap.fraction override was important, to ensure 
> the Heap+Native memory of tez children didn't exceed the allocated container 
> size. As the Application/Scalding/Scala/Cascading/Tez/JVM stack probably 
> takes a little more native memory than a more basic app/Tez/JVM stack, the 
> defaults were too small and YARN kept yanking Tez children at times. As has 
> been pointed out, running the cluster with 
> yarn.nodemanager.pmem-check-enabled=false removes the need for that.
> 
> An item in one DAG was creating Out-Of-Memory trouble in Tez' DefaultSorter 
> when running with the default task memory at 1024MiB (716MiB for the heap), 
> increasing it slightly was a coward but effective workaround.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> <0001-Switch-to-guava-s-newer-API-for-Stopwatch-14.0.patch><0002-DO-NOT-COMMIT-bump-guava-to-0.16.patch><0003-WIP-TEZ-2256-a-first-attempt.patch><0004-TEZ-2237-hack.branch6.txt-WIP-trying-out-patch-from-.patch><0005-TEZ-2237-WIP-experimenal-patch-2-TEZ-2237.test.2_bra.patch>


—
Chris K Wensel
[email protected]

Re: Report: Scalding+Cascading+TEZ = ♥

Reply via email to