Have you tried suggestions given in this thread ?

http://stackoverflow.com/questions/26256061/using-itext-java-lang-outofmemoryerror-requested-array-size-exceeds-vm-limit

Can you pastebin complete stack trace ?

What release of Spark are you using ?

Cheers

> On Sep 22, 2015, at 4:28 AM, Yusuf Can Gürkan <yu...@useinsider.com> wrote:
> 
> I run the code below and getting error:
> 
> val dateUtil = new DateUtil()
> 
> val usersInputDF = sqlContext.sql(
>   s"""
>      |  select userid,concat_ws(' ',collect_list(concat_ws(' ',if(productname 
> is not 
> NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(productcategory,2,length(productcategory)-2),'\"',''),\",\",'
>  '))))) inputlist from landing where 
> dt='${dateUtil.getYear}-${dateUtil.getMonth}' and userid != '' and userid is 
> not null and userid is not NULL and pagetype = 'productDetail' group by userid
> 
>    """.stripMargin)
> 
> usersInputDF.registerTempTable("users_product_visits")
> 
> sqlContext.sql("cache table users_product_visits")
> 
> ERROR:
> 
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>       at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
> 
> 
> 
> One of the task’s shuffle read size is always much more than others as you 
> can see below. What can cause this? My table above is an external table which 
> source is S3.
> 
> 

Reply via email to