Re: Spark streaming vs. spark usage

Nathan Kronenfeld Wed, 18 Dec 2013 12:19:17 -0800

On Wed, Dec 18, 2013 at 2:32 PM, Patrick Wendell <pwend...@gmail.com> wrote:


> inputStream.window(X => Y).transform(rdd => existingBatchStuff(rdd))


 If I understand correctly, that calls existingBatchStuff on each RDD from
which the window is made, not on the window as a whole.  If I want a
combined result across the whole window, I'm not sure how to do that
directly.

Basically, we're just aggregating data into a standard composite form.
 Again, if I understand the above line correctly, it leaves us with a
series of these composites, rather than a single one.  Of course, we could
write a + function on the composite, and just use:

inputStream.window(X => Y).transform(rdd =>
existingBatchStuff(rdd)).reduce(_ + _)

which would be sufficient... it just feels odd to me, that there's an extra
step on the end which requires knowledge of the underlying structure to
understand.


Since many of the functions exist in parallel between the two, I guess I
would expect something like:

trait BasicRDDFunctions {
def map...
def reduce...
def filter...
def foreach...
}

class RDD extends  BasicRDDFunctions...
class DStream extends BasicRDDFunctions...


where each of RDD and DStream might have any number of other functions, but
if you were just using the basic ones, you would just call
existingBatchStuff(rdd)
or
existingBatchStuff(dstream)



-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenf...@oculusinfo.com

Re: Spark streaming vs. spark usage

Reply via email to