subject:"spark.sql.shuffle.partitions=1 seems to be working fine but creates timeout for large skewed data"

Re: spark.sql.shuffle.partitions=1 seems to be working fine but creates timeout for large skewed data

2015-08-20 Thread Umesh Kacha

Hi Hemant sorry for the confusion I meant final output part files in the final directory hdfs I never meant intermediate files. Thanks. My goal is to reduce those many files because of my use case explained in the first email with calculations. On Aug 20, 2015 5:59 PM, Hemant Bhanawat

Re: spark.sql.shuffle.partitions=1 seems to be working fine but creates timeout for large skewed data

2015-08-20 Thread Hemant Bhanawat

Sorry, I misread your mail. Thanks for pointing that out. BTW, are the 8 files shuffle intermediate output and not the final output? I assume yes. I didn't know that you can keep intermediate output on HDFS and I don't think that is recommended. On Thu, Aug 20, 2015 at 2:43 PM, Hemant