I'd like to resurrect this thread since I don't have an answer yet. From: Adrian Mocanu [mailto:amoc...@verticalscope.com] Sent: March-27-14 10:04 AM To: u...@spark.incubator.apache.org Subject: function state lost when next RDD is processed
Is there a way to pass a custom function to spark to run it on the entire stream? For example, say I have a function which sums up values in each RDD and then across RDDs. I've tried with map, transform, reduce. They all apply my sum function on 1 RDD. When the next RDD comes the function starts from 0 so the sum of the previous RDD is lost. Does Spark support a way of passing a custom function so that its state is preserved across RDDs and not only within RDD? Thanks -Adrian