Re: Spark streaming vs. spark usage

2013-12-18 Thread Nathan Kronenfeld
On Wed, Dec 18, 2013 at 2:32 PM, Patrick Wendell pwend...@gmail.com wrote: inputStream.window(X = Y).transform(rdd = existingBatchStuff(rdd)) If I understand correctly, that calls existingBatchStuff on each RDD from which the window is made, not on the window as a whole. If I want a combined

Re: Spark streaming vs. spark usage

2013-12-18 Thread Patrick Wendell
Hey Nathan, If I understand correctly, that calls existingBatchStuff on each RDD from which the window is made, not on the window as a whole. If I want a combined result across the whole window, I'm not sure how to do that directly. Actually, once you call window() you get a new sequence

Re: Spark streaming vs. spark usage

2013-12-18 Thread Ashish Rangole
I wonder if it will help to have a generic Monad container that wraps either RDD or DStream and provides map, flatmap, foreach and filter methods. case class DataMonad[A](data: A) { def map[B]( f : A = B ) : DataMonad[B] = { DataMonad( f( data ) ) } def flatMap[B]( f : A =

Spark streaming vs. spark usage

2013-12-17 Thread Nathan Kronenfeld
Hi, Folks. We've just started looking at Spark Streaming, and I find myself a little confused. As I understood it, one of the main points of the system was that one could use the same code when streaming, doing batch processing, or whatnot. Yet when we try to apply a batch processor that