Hi,

I have Spark application which computes join of two RDDs. One contains
around 150MB of data (7 million entries) second around 1,5MB (80 thousand
entries) and
result of this join contains 50MB of data (2 million entries).

When I run it on one core (with master=local) it works correctly (whole
process uses between 600 and 700MB of memory) but when I run it on all
cores (with master=local[*]) it throws:
java.lang.OutOfMemoryError: GC overhead limit exceeded
and sometimes
java.lang.OutOfMemoryError: Java heap space

I have set spark.executor.memory=512m (default value).

Does anyone know why above occurs?

Thanks,
Grzegorz

Reply via email to