Re: Read Accumulator value while running

Mennour Rostom Thu, 14 Jan 2016 01:23:46 -0800

Hi Daniel, Andrew

Thank you for your answers, So it s not possible to read the accumulator
value until the action that manipulate it finishes. it's bad, I ll think to
something else. However the main important thing in my application is the
ability to lunche 2 (or more) actions in parallel and concurrently :  "
*within* each Spark application, multiple “jobs” (Spark actions) may be
running concurrently if they were submitted by different threads" [Job
Scheduling Official].


my actions have to run on the same RDD : eg.

RDD D;
D.ac1(func1) [foreach fo instance]  //  D.ac2(func2) [foreachePartition or
whatever]


can this be done by Asynchronous Actions ? (it is not working for me on a
single node :/ )
can I use a broadcast (same) variable in the two partions ? if yes, what
happens if I change the value of the broadcast variable ?

in the other hand, i would realy like to know more about Spark 2.0

Thank you,
Regards

2016-01-13 20:31 GMT+01:00 Andrew Or <and...@databricks.com>:

> Hi Kira,
>
> As you suspected, accumulator values are only updated after the task
> completes. We do send accumulator updates from the executors to the driver
> on periodic heartbeats, but these only concern internal accumulators, not
> the ones created by the user.
>
> In short, I'm afraid there is not currently a way (in Spark 1.6 and
> before) to access the accumulator values until after the tasks that updated
> them have completed. This will change in Spark 2.0, the next version,
> however.
>
> Please let me know if you have more questions.
> -Andrew
>
> 2016-01-13 11:24 GMT-08:00 Daniel Imberman <daniel.imber...@gmail.com>:
>
>> Hi Kira,
>>
>> I'm having some trouble understanding your question. Could you please
>> give a code example?
>>
>>
>>
>> From what I think you're asking there are two issues with what you're
>> looking to do. (Please keep in mind I could be totally wrong on both of
>> these assumptions, but this is what I've been lead to believe)
>>
>> 1. The contract of an accumulator is that you can't actually read the
>> value as the function is performing because the values in the accumulator
>> don't actually mean anything until they are reduced. If you were looking
>> for progress in a local context, you could do mapPartitions and have a
>> local accumulator per partition, but I don't think it's possible to get the
>> actual accumulator value in the middle of the map job.
>>
>> 2. As far as performing ac2 while ac1 is "always running", I'm pretty
>> sure that's not possible. The way that lazy valuation works in Spark, the
>> transformations have to be done serially. Having it any other way would
>> actually be really bad because then you could have ac1 changing the data
>> thereby making ac2's output unpredictable.
>>
>> That being said, with a more specific example it might be possible to
>> help figure out a solution that accomplishes what you are trying to do.
>>
>> On Wed, Jan 13, 2016 at 5:43 AM Kira <mennou...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> So i have an action on one RDD that is relatively long, let's call it
>>> ac1;
>>> what i want to do is to execute another action (ac2) on the same RDD to
>>> see
>>> the evolution of the first one (ac1); for this end i want to use an
>>> accumulator and read it's value progressively to see the changes on it
>>> (on
>>> the fly) while ac1 is always running. My problem is that the accumulator
>>> is
>>> only updated once the ac1 has been finished, this is not helpful for me
>>> :/ .
>>>
>>> I ve seen  here
>>> <
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-td15758.html
>>> >
>>> what may seem like a solution for me but it doesn t work : "While Spark
>>> already offers support for asynchronous reduce (collect data from
>>> workers,
>>> while not interrupting execution of a parallel transformation) through
>>> accumulator"
>>>
>>> Another post suggested to use SparkListner to do that.
>>>
>>> are these solutions correct ? if yes, give me a simple exemple ?
>>> are there other solutions ?
>>>
>>> thank you.
>>> Regards
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Read-Accumulator-value-while-running-tp25960.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>

Re: Read Accumulator value while running

Reply via email to