Hi,
I'm trying to run the SimpleApp example (
http://spark.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala)
on a larger dataset.
The input file is about 1GB, but when I run the Spark program, it
says:java.lang.OutOfMemoryError: GC overhead limit exceeded, the full
error output
Hi,
I'm wondering whether it's possible to continuously merge the RDDs coming
from a stream into a single RDD efficiently.
One thought is to use the union() method. But using union, I will get a new
RDD each time I do a merge. I don't know how I should name these RDDs,
because I remember Spark