On Wed, Dec 18, 2013 at 2:32 PM, Patrick Wendell pwend...@gmail.com wrote:
inputStream.window(X = Y).transform(rdd = existingBatchStuff(rdd))
If I understand correctly, that calls existingBatchStuff on each RDD from
which the window is made, not on the window as a whole. If I want a
combined
Hey Nathan,
If I understand correctly, that calls existingBatchStuff on each RDD from
which the window is made, not on the window as a whole. If I want a
combined result across the whole window, I'm not sure how to do that
directly.
Actually, once you call window() you get a new sequence
I wonder if it will help to have a generic Monad container that wraps
either RDD or DStream and provides
map, flatmap, foreach and filter methods.
case class DataMonad[A](data: A) {
def map[B]( f : A = B ) : DataMonad[B] = {
DataMonad( f( data ) )
}
def flatMap[B]( f : A =
Hi, Folks.
We've just started looking at Spark Streaming, and I find myself a little
confused.
As I understood it, one of the main points of the system was that one could
use the same code when streaming, doing batch processing, or whatnot.
Yet when we try to apply a batch processor that