Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

Jonathan Coveney Mon, 13 Apr 2015 10:43:07 -0700

I'm not 100% sure of spark's implementation but in the MR frameworks, it
would have a much larger shuffle write size becasue that node is dealing
with a lot more data and as a result has a lot  more to shuffle


2015-04-13 13:20 GMT-04:00 java8964 <java8...@hotmail.com>:

> If it is really due to data skew, will the task hanging has much bigger 
> Shuffle
> Write Size in this case?
>
> In this case, the shuffle write size for that task is 0, and the rest IO
> of this task is not much larger than the fast finished tasks, is that
> normal?
>
> I am also interested in this case, as from statistics on the UI, how it
> indicates the task could have skew data?
>
> Yong
>
> ------------------------------
> Date: Mon, 13 Apr 2015 12:58:12 -0400
> Subject: Re: Equi Join is taking for ever. 1 Task is Running while other
> 199 are complete
> From: jcove...@gmail.com
> To: deepuj...@gmail.com
> CC: user@spark.apache.org
>
>
> I can promise you that this is also a problem in the pig world :) not sure
> why it's not a problem for this data set, though... are you sure that the
> two are doing the exact same code?
>
> you should inspect your source data. Make a histogram for each and see
> what the data distribution looks like. If there is a value or bucket with a
> disproportionate set of values you know you have an issue
>
> 2015-04-13 12:50 GMT-04:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>:
>
> You mean there is a tuple in either RDD, that has itemID = 0 or null ?
> And what is catch all ?
>
> That implies is it a good idea to run a filter on each RDD first ? We do
> not do this using Pig on M/R. Is it required in Spark world ?
>
> On Mon, Apr 13, 2015 at 9:58 PM, Jonathan Coveney <jcove...@gmail.com>
> wrote:
>
> My guess would be data skew. Do you know if there is some item id that is
> a catch all? can it be null? item id 0? lots of data sets have this sort of
> value and it always kills joins
>
> 2015-04-13 11:32 GMT-04:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>:
>
> Code:
>
> val viEventsWithListings: RDD[(Long, (DetailInputRecord, VISummary,
> Long))] = lstgItem.join(viEvents).map {
>       case (itemId, (listing, viDetail)) =>
>         val viSummary = new VISummary
>         viSummary.leafCategoryId = listing.getLeafCategId().toInt
>         viSummary.itemSiteId = listing.getItemSiteId().toInt
>         viSummary.auctionTypeCode = listing.getAuctTypeCode().toInt
>         viSummary.sellerCountryId = listing.getSlrCntryId().toInt
>         viSummary.buyerSegment = "0"
>         viSummary.isBin = (if (listing.getBinPriceLstgCurncy.doubleValue()
> > 0) 1 else 0)
>         val sellerId = listing.getSlrId.toLong
>         (sellerId, (viDetail, viSummary, itemId))
>     }
>
> Running Tasks:
> Tasks IndexIDAttemptStatus ▾Locality LevelExecutor ID / HostLaunch Time
> DurationGC TimeShuffle Read Size / RecordsWrite TimeShuffle Write Size /
> RecordsShuffle Spill (Memory)Shuffle Spill (Disk)Errors  0 216 0 RUNNING
> PROCESS_LOCAL 181 / phxaishdc9dn0474.phx.ebay.com 2015/04/13 06:43:53 1.7
> h  13 min  3.0 GB / 56964921  0.0 B / 0  21.2 GB 1902.6 MB   2 218 0
> SUCCESS PROCESS_LOCAL 582 / phxaishdc9dn0235.phx.ebay.com 2015/04/13
> 06:43:53 15 min  31 s  2.2 GB / 1666851  0.1 s 3.0 MB / 2062  54.8 GB 1924.5
> MB   1 217 0 SUCCESS PROCESS_LOCAL 202 /
> phxdpehdc9dn2683.stratus.phx.ebay.com 2015/04/13 06:43:53 19 min  1.3 min
> 2.2 GB / 1687086  75 ms 3.9 MB / 2692  33.7 GB 1960.4 MB   4 220 0 SUCCESS
> PROCESS_LOCAL 218 / phxaishdc9dn0855.phx.ebay.com 2015/04/13 06:43:53 15
> min  56 s  2.2 GB / 1675654  40 ms 3.3 MB / 2260  26.2 GB 1928.4 MB
>
>
>
> Command:
> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
> /apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/hadoop-hdfs-2.4.1-EBAY-2.jar
> --jars
> /apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/hdfs/hadoop-hdfs-2.4.1-EBAY-2.jar,/home/dvasthimal/spark1.3/spark_reporting_dep_only-1.0-SNAPSHOT.jar
>  --num-executors 3000 --driver-memory 12g --driver-java-options
> "-XX:MaxPermSize=6G" --executor-memory 12g --executor-cores 1 --queue
> hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp
> /home/dvasthimal/spark1.3/spark_reporting-1.0-SNAPSHOT.jar
> startDate=2015-04-6 endDate=2015-04-7
> input=/user/dvasthimal/epdatasets_small/exptsession subcommand=viewItem
> output=/user/dvasthimal/epdatasets/viewItem buffersize=128
> maxbuffersize=1068 maxResultSize=2G
>
>
> What do i do ? I killed the job twice and its stuck again. Where is it
> stuck ?
>
> --
> Deepak
>
>
>
>
>
> --
> Deepak
>
>
>

Re: Equi Join is taking for ever. 1 Task is Running while other 199 are complete

Reply via email to