Hi All, I have a custom implementation of K-Means where it needs the data to be grouped by a key in a dataframe. Now there is a big data skew for some of the keys , where it exceeds the BufferHolder: Cannot grow BufferHolder by size 17112 because the size after growing exceeds size limitation 2147483632
I tried solving it by converting the dataframe to RDD and then using reduceByKey on RDD and converting it back to RDD. This gives Java Heap : Out of memory error. Since it looks like a common issue , i was wondering how anyone would be solving this problem ? -- Thanks Deepak