or how about the UpdateStateByKey() operation?

https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html

the StatefulNetworkWordCount example demonstrates how to keep state across RDDs.

> On Mar 28, 2014, at 8:44 PM, Mayur Rustagi <mayur.rust...@gmail.com> wrote:
> 
> Are you referring to Spark Streaming?
> 
> Can you save the sum as a RDD & keep joining the two rdd together?
> 
> Regards
> Mayur
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
>> On Fri, Mar 28, 2014 at 10:47 AM, Adrian Mocanu <amoc...@verticalscope.com> 
>> wrote:
>> Thanks!
>> 
>>  
>> 
>> Ya that’s what I’m doing so far, but I wanted to see if it’s possible to 
>> keep the tuples inside Spark for fault tolerance purposes.
>> 
>>  
>> 
>> -A
>> 
>> From: Mark Hamstra [mailto:m...@clearstorydata.com] 
>> Sent: March-28-14 10:45 AM
>> To: user@spark.apache.org
>> Subject: Re: function state lost when next RDD is processed
>> 
>>  
>> 
>> As long as the amount of state being passed is relatively small, it's 
>> probably easiest to send it back to the driver and to introduce it into RDD 
>> transformations as the zero value of a fold.
>> 
>>  
>> 
>> On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu <amoc...@verticalscope.com> 
>> wrote:
>> 
>> I’d like to resurrect this thread since I don’t have an answer yet.
>> 
>>  
>> 
>> From: Adrian Mocanu [mailto:amoc...@verticalscope.com] 
>> Sent: March-27-14 10:04 AM
>> To: u...@spark.incubator.apache.org
>> Subject: function state lost when next RDD is processed
>> 
>>  
>> 
>> Is there a way to pass a custom function to spark to run it on the entire 
>> stream? For example, say I have a function which sums up values in each RDD 
>> and then across RDDs.
>> 
>>  
>> 
>> I’ve tried with map, transform, reduce. They all apply my sum function on 1 
>> RDD. When the next RDD comes the function starts from 0 so the sum of the 
>> previous RDD is lost.
>> 
>>  
>> 
>> Does Spark support a way of passing a custom function so that its state is 
>> preserved across RDDs and not only within RDD?
>> 
>>  
>> 
>> Thanks
>> 
>> -Adrian
>> 
> 

Reply via email to