I run the code below and getting error:

val dateUtil = new DateUtil()

val usersInputDF = sqlContext.sql(
  s"""
     |  select userid,concat_ws(' ',collect_list(concat_ws(' ',if(productname 
is not 
NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(productcategory,2,length(productcategory)-2),'\"',''),\",\",'
 '))))) inputlist from landing where 
dt='${dateUtil.getYear}-${dateUtil.getMonth}' and userid != '' and userid is 
not null and userid is not NULL and pagetype = 'productDetail' group by userid

   """.stripMargin)

usersInputDF.registerTempTable("users_product_visits")

sqlContext.sql("cache table users_product_visits")

ERROR:

java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)



One of the task’s shuffle read size is always much more than others as you can 
see below. What can cause this? My table above is an external table which 
source is S3.


Reply via email to