On Wed, Dec 18, 2013 at 2:32 PM, Patrick Wendell <pwend...@gmail.com> wrote:
> inputStream.window(X => Y).transform(rdd => existingBatchStuff(rdd)) If I understand correctly, that calls existingBatchStuff on each RDD from which the window is made, not on the window as a whole. If I want a combined result across the whole window, I'm not sure how to do that directly. Basically, we're just aggregating data into a standard composite form. Again, if I understand the above line correctly, it leaves us with a series of these composites, rather than a single one. Of course, we could write a + function on the composite, and just use: inputStream.window(X => Y).transform(rdd => existingBatchStuff(rdd)).reduce(_ + _) which would be sufficient... it just feels odd to me, that there's an extra step on the end which requires knowledge of the underlying structure to understand. Since many of the functions exist in parallel between the two, I guess I would expect something like: trait BasicRDDFunctions { def map... def reduce... def filter... def foreach... } class RDD extends BasicRDDFunctions... class DStream extends BasicRDDFunctions... where each of RDD and DStream might have any number of other functions, but if you were just using the basic ones, you would just call existingBatchStuff(rdd) or existingBatchStuff(dstream) -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com