I am seeing the same issue with Spark 1.3.1. I see this issue when reading sequence file stored in Sequence File format (SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec?v? )
All i do is sc.sequenceFile(dwTable, classOf[Text], classOf[Text]).partitionBy(new org.apache.spark.HashPartitioner(2053)) .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryoserializer.buffer.mb", arguments.get("buffersize").get) .set("spark.kryoserializer.buffer.max.mb", arguments.get("maxbuffersize").get) .set("spark.driver.maxResultSize", arguments.get("maxResultSize").get) .set("spark.yarn.maxAppAttempts", "0") //.set("spark.akka.askTimeout", arguments.get("askTimeout").get) //.set("spark.akka.timeout", arguments.get("akkaTimeout").get) //.set("spark.worker.timeout", arguments.get("workerTimeout").get) .registerKryoClasses(Array(classOf[com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum])) and values are buffersize=128 maxbuffersize=1068 maxResultSize=200G On Thu, May 7, 2015 at 8:04 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > I'm using the default settings. > > Jianshi > > On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva < > twinkle.sachd...@gmail.com> wrote: > >> Hi, >> >> Can you please share your compression etc settings, which you are using. >> >> Thanks, >> Twinkle >> >> On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang <jianshi.hu...@gmail.com> >> wrote: >> >>> I'm facing this error in Spark 1.3.1 >>> >>> https://issues.apache.org/jira/browse/SPARK-4105 >>> >>> Anyone knows what's the workaround? Change the compression codec for >>> shuffle output? >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Deepak