Have you tried suggestions given in this thread ? http://stackoverflow.com/questions/26256061/using-itext-java-lang-outofmemoryerror-requested-array-size-exceeds-vm-limit
Can you pastebin complete stack trace ? What release of Spark are you using ? Cheers > On Sep 22, 2015, at 4:28 AM, Yusuf Can Gürkan <yu...@useinsider.com> wrote: > > I run the code below and getting error: > > val dateUtil = new DateUtil() > > val usersInputDF = sqlContext.sql( > s""" > | select userid,concat_ws(' ',collect_list(concat_ws(' ',if(productname > is not > NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(productcategory,2,length(productcategory)-2),'\"',''),\",\",' > '))))) inputlist from landing where > dt='${dateUtil.getYear}-${dateUtil.getMonth}' and userid != '' and userid is > not null and userid is not NULL and pagetype = 'productDetail' group by userid > > """.stripMargin) > > usersInputDF.registerTempTable("users_product_visits") > > sqlContext.sql("cache table users_product_visits") > > ERROR: > > java.lang.OutOfMemoryError: Requested array size exceeds VM limit > at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) > > > > One of the task’s shuffle read size is always much more than others as you > can see below. What can cause this? My table above is an external table which > source is S3. > >