Hi, I was facing GC overhead errors while executing an application with 570MB data(with rdd replication).
In order to fix the heap errors, I repartitioned the rdd to 10: val logData = sc.textFile("hdfs:/text_data/text data.txt").persist(StorageLevel.MEMORY_ONLY_2) val parts=logData.coalesce(10,true) println(parts.partitions.length). But the problem is, WebUI still shows number of partitions as 5 while the print statement outputs 10. I tried even repartition(), but face the same problem. Also, does webUI show the storage details of each partition twice when I replicate the rdd? Because, I see that webUI displays each partition only once while it says 2 x replicated. Can someone help me out in this!!! -Karthik