Hi Daniel, Andrew Thank you for your answers, So it s not possible to read the accumulator value until the action that manipulate it finishes. it's bad, I ll think to something else. However the main important thing in my application is the ability to lunche 2 (or more) actions in parallel and concurrently : " *within* each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads" [Job Scheduling Official].
my actions have to run on the same RDD : eg. RDD D; D.ac1(func1) [foreach fo instance] // D.ac2(func2) [foreachePartition or whatever] can this be done by Asynchronous Actions ? (it is not working for me on a single node :/ ) can I use a broadcast (same) variable in the two partions ? if yes, what happens if I change the value of the broadcast variable ? in the other hand, i would realy like to know more about Spark 2.0 Thank you, Regards 2016-01-13 20:31 GMT+01:00 Andrew Or <and...@databricks.com>: > Hi Kira, > > As you suspected, accumulator values are only updated after the task > completes. We do send accumulator updates from the executors to the driver > on periodic heartbeats, but these only concern internal accumulators, not > the ones created by the user. > > In short, I'm afraid there is not currently a way (in Spark 1.6 and > before) to access the accumulator values until after the tasks that updated > them have completed. This will change in Spark 2.0, the next version, > however. > > Please let me know if you have more questions. > -Andrew > > 2016-01-13 11:24 GMT-08:00 Daniel Imberman <daniel.imber...@gmail.com>: > >> Hi Kira, >> >> I'm having some trouble understanding your question. Could you please >> give a code example? >> >> >> >> From what I think you're asking there are two issues with what you're >> looking to do. (Please keep in mind I could be totally wrong on both of >> these assumptions, but this is what I've been lead to believe) >> >> 1. The contract of an accumulator is that you can't actually read the >> value as the function is performing because the values in the accumulator >> don't actually mean anything until they are reduced. If you were looking >> for progress in a local context, you could do mapPartitions and have a >> local accumulator per partition, but I don't think it's possible to get the >> actual accumulator value in the middle of the map job. >> >> 2. As far as performing ac2 while ac1 is "always running", I'm pretty >> sure that's not possible. The way that lazy valuation works in Spark, the >> transformations have to be done serially. Having it any other way would >> actually be really bad because then you could have ac1 changing the data >> thereby making ac2's output unpredictable. >> >> That being said, with a more specific example it might be possible to >> help figure out a solution that accomplishes what you are trying to do. >> >> On Wed, Jan 13, 2016 at 5:43 AM Kira <mennou...@gmail.com> wrote: >> >>> Hi, >>> >>> So i have an action on one RDD that is relatively long, let's call it >>> ac1; >>> what i want to do is to execute another action (ac2) on the same RDD to >>> see >>> the evolution of the first one (ac1); for this end i want to use an >>> accumulator and read it's value progressively to see the changes on it >>> (on >>> the fly) while ac1 is always running. My problem is that the accumulator >>> is >>> only updated once the ac1 has been finished, this is not helpful for me >>> :/ . >>> >>> I ve seen here >>> < >>> http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-td15758.html >>> > >>> what may seem like a solution for me but it doesn t work : "While Spark >>> already offers support for asynchronous reduce (collect data from >>> workers, >>> while not interrupting execution of a parallel transformation) through >>> accumulator" >>> >>> Another post suggested to use SparkListner to do that. >>> >>> are these solutions correct ? if yes, give me a simple exemple ? >>> are there other solutions ? >>> >>> thank you. >>> Regards >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Read-Accumulator-value-while-running-tp25960.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >