Hi, I have Spark application which computes join of two RDDs. One contains around 150MB of data (7 million entries) second around 1,5MB (80 thousand entries) and result of this join contains 50MB of data (2 million entries).
When I run it on one core (with master=local) it works correctly (whole process uses between 600 and 700MB of memory) but when I run it on all cores (with master=local[*]) it throws: java.lang.OutOfMemoryError: GC overhead limit exceeded and sometimes java.lang.OutOfMemoryError: Java heap space I have set spark.executor.memory=512m (default value). Does anyone know why above occurs? Thanks, Grzegorz