Setting spark memory limit

2014-06-09 Thread Henggang Cui
Hi, I'm trying to run the SimpleApp example ( http://spark.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala) on a larger dataset. The input file is about 1GB, but when I run the Spark program, it says:java.lang.OutOfMemoryError: GC overhead limit exceeded, the full error output

Merging all Spark Streaming RDDs to one RDD

2014-06-09 Thread Henggang Cui
Hi, I'm wondering whether it's possible to continuously merge the RDDs coming from a stream into a single RDD efficiently. One thought is to use the union() method. But using union, I will get a new RDD each time I do a merge. I don't know how I should name these RDDs, because I remember Spark