RE: Q: Discretized Streams: Fault-Tolerant Streaming Computation paper

Adrian Mocanu Wed, 19 Feb 2014 14:05:31 -0800

That makes sense.
I wonder if any of the authors of that paper could comment.

From: dachuan [mailto:[email protected]]
Sent: February-19-14 3:55 PM
To: [email protected]
Subject: Re: Q: Discretized Streams: Fault-Tolerant Streaming Computation paper

It seems StreamingContext has a function:

  def remember(duration: Duration) {
    graph.remember(duration)
  }

and in my opinion, incremental reduce means:

1 2 3 4 5 6
window_size =5
sum_of_first_window = 1+2+3+4+5=15
sum_of_second_window_method_1=2+3+4+5+6=20
sum_of_second_window_method_2=15+6-1=20

On Wed, Feb 19, 2014 at 3:45 PM, Adrian Mocanu 
<[email protected]<mailto:[email protected]>> wrote:
I have a question on the following paper
"Discretized Streams: Fault-Tolerant Streaming Computation at Scale"
written by
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion 
Stoica
and available at
http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf

Specifically I'm interested in Section 3.2 on page 5 called "Timing 
Considerations".
This section talks about external timestamp. For me I'm looking to use method 2 
and correct for late records at the
application level.
The paper says "[application] could output a new count for time interval [t, 
t+1) at time t+5, based on the records for this interval received between t and 
t+5. This computation can be performed with an efficient incremental reduce 
operation that adds the old counts computed at t+1 to the counts of new records 
since then, avoiding wasted work."

Q1:
If my data comes 24 hours late on the same DStream (ie: t=t0+24hr) and I'm 
recording per minute aggregates, wouldn't the RDD with data which came 24 hours 
ago be already deleted from disk by Spark? (I'd hope so otherwise it runs out 
of space)

Q2:
The paper talks about "incremental reduce". I'd like to know what it is. I do 
use reduce so I could get an aggregate of counts. What is this incremental 
reduce?

Thanks
-A

--
Dachuan Huang
Cellphone: 614-390-7234
2015 Neil Avenue
Ohio State University
Columbus, Ohio
U.S.A.
43210

RE: Q: Discretized Streams: Fault-Tolerant Streaming Computation paper

Reply via email to