I run the code below and getting error: val dateUtil = new DateUtil()
val usersInputDF = sqlContext.sql( s""" | select userid,concat_ws(' ',collect_list(concat_ws(' ',if(productname is not NULL,lower(productname),''),lower(regexp_replace(regexp_replace(substr(productcategory,2,length(productcategory)-2),'\"',''),\",\",' '))))) inputlist from landing where dt='${dateUtil.getYear}-${dateUtil.getMonth}' and userid != '' and userid is not null and userid is not NULL and pagetype = 'productDetail' group by userid """.stripMargin) usersInputDF.registerTempTable("users_product_visits") sqlContext.sql("cache table users_product_visits") ERROR: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) One of the task’s shuffle read size is always much more than others as you can see below. What can cause this? My table above is an external table which source is S3.