Merging all Spark Streaming RDDs to one RDD

2014-06-09 Thread Henggang Cui
Hi, I'm wondering whether it's possible to continuously merge the RDDs coming from a stream into a single RDD efficiently. One thought is to use the union() method. But using union, I will get a new RDD each time I do a merge. I don't know how I should name these RDDs, because I remember Spark do

Re: Merging all Spark Streaming RDDs to one RDD

2014-06-12 Thread unorthodox . engineers
I have much the same issue. While I haven't totally solved it yet, I have found the "window" method useful for batching up archive blocks - but updateStateByKey is probably what we want to use, perhaps multiple times. If that works. My bigger worry now is storage. Unlike non-streaming apps, we