Re: Shuffle write increases in spark 1.2

2015-02-15 Thread Ami Khandeshi
; >> *Sender* : Josh Rosen >> >> *Date* : 2015-01-05 06:14 (GMT+09:00) >> >> *Title* : Re: Shuffle write increases in spark 1.2 >> >> >> If you have a small reproduction for this issue, can you open a ticket at >> https://issues.apache.org/jira

Re: Shuffle write increases in spark 1.2

2015-02-15 Thread Aaron Davidson
I think Xuefeng Wu's suggestion is likely correct. This different is more likely explained by the compression library changing versions than sort vs hash shuffle (which should not affect output size significantly). Others have reported that switching to lz4 fixed their issue. We should document th

Re: Shuffle write increases in spark 1.2

2015-02-14 Thread Peng Cheng
I double check the 1.2 feature list and found out that the new sort-based shuffle manager has nothing to do with HashPartitioner :-< Sorry for the misinformation. In another hand. This may explain increase in shuffle spill as a side effect of the new shuffle manager, let me revert spark.shuffle.ma

Re: Shuffle write increases in spark 1.2

2015-02-14 Thread Peng Cheng
Same problem here, shuffle write increased from 10G to over 64G, since I'm running on amazon EC2 this always cause temporary folder to consume all the disk space. Still looking for a solution. BTW, the 64G shuffle write is encountered on shuffling a pairRDD with HashPartitioner, so its not related

Re: Shuffle write increases in spark 1.2

2015-02-10 Thread Xuefeng Wu
It looks because different snappy version, if you disable compress or switch to lz4, the size is no different. Yours, Xuefeng Wu 吴雪峰 敬上 > On 2015年2月10日, at 下午6:13, chris wrote: > > Hello, > > as the original message from Kevin Jung never got accepted to the > mailinglist, I quote it here com

Re: Shuffle write increases in spark 1.2

2015-02-10 Thread chris
Hello, as the original message from Kevin Jung never got accepted to the mailinglist, I quote it here completely: Kevin Jung wrote > Hi all, > The size of shuffle write showing in spark web UI is much different when I > execute same spark job on same input data(100GB) in both spark 1.1 and > spa

Re: Shuffle write increases in spark 1.2

2015-02-10 Thread chris
Hello, as the original message never got accepted to the mailinglist, I quote it here completely: Kevin Jung wrote > Hi all, > The size of shuffle write showing in spark web UI is much different when I > execute same spark job on same input data(100GB) in both spark 1.1 and > spark 1.2. > At the

Re: Shuffle write increases in spark 1.2

2015-02-05 Thread Anubhav Srivastav
081 > > > > --- *Original Message* --- > > *Sender* : Josh Rosen > > *Date* : 2015-01-05 06:14 (GMT+09:00) > > *Title* : Re: Shuffle write increases in spark 1.2 > > > If you have a small reproduction for this issue, can you open a ticket at > https://iss

Re: Shuffle write increases in spark 1.2

2015-01-04 Thread 정재부
Sure, here is a ticket. https://issues.apache.org/jira/browse/SPARK-5081   --- Original Message --- Sender : Josh Rosen Date : 2015-01-05 06:14 (GMT+09:00) Title : Re: Shuffle write increases in spark 1.2   If you have a small reproduction for this issue, can you open a

Re: Shuffle write increases in spark 1.2

2015-01-04 Thread Josh Rosen
If you have a small reproduction for this issue, can you open a ticket at  https://issues.apache.org/jira/browse/SPARK ? On December 29, 2014 at 7:10:02 PM, Kevin Jung (itsjb.j...@samsung.com) wrote: Hi all, The size of shuffle write showing in spark web UI is mush different when I execute