Re: simultaneous actions

Matei Zaharia Sun, 17 Jan 2016 12:15:11 -0800

They'll be able to run concurrently and share workers / data. Take a look at 
http://spark.apache.org/docs/latest/job-scheduling.html 
<http://spark.apache.org/docs/latest/job-scheduling.html> for how scheduling 
happens across multiple running jobs in the same SparkContext.


Matei

> On Jan 17, 2016, at 8:06 AM, Koert Kuipers <ko...@tresata.com> wrote:
> 
> Same rdd means same sparkcontext means same workers
> 
> Cache/persist the rdd to avoid repeated jobs
> 
> On Jan 17, 2016 5:21 AM, "Mennour Rostom" <mennou...@gmail.com 
> <mailto:mennou...@gmail.com>> wrote:
> Hi,
> 
> Thank you all for your answers,
> 
> If I correctly understand, actions (in my case foreach) can be run 
> concurrently and simultaneously on the SAME rdd, (which is logical because 
> they are read only object). however, I want to know if the same workers are 
> used for the concurrent analysis ?
> 
> Thank you
> 
> 2016-01-15 21:11 GMT+01:00 Jakob Odersky <joder...@gmail.com 
> <mailto:joder...@gmail.com>>:
> I stand corrected. How considerable are the benefits though? Will the 
> scheduler be able to dispatch jobs from both actions simultaneously (or on a 
> when-workers-become-available basis)?
> 
> On 15 January 2016 at 11:44, Koert Kuipers <ko...@tresata.com 
> <mailto:ko...@tresata.com>> wrote:
> we run multiple actions on the same (cached) rdd all the time, i guess in 
> different threads indeed (its in akka)
> 
> On Fri, Jan 15, 2016 at 2:40 PM, Matei Zaharia <matei.zaha...@gmail.com 
> <mailto:matei.zaha...@gmail.com>> wrote:
> RDDs actually are thread-safe, and quite a few applications use them this 
> way, e.g. the JDBC server.
> 
> Matei
> 
>> On Jan 15, 2016, at 2:10 PM, Jakob Odersky <joder...@gmail.com 
>> <mailto:joder...@gmail.com>> wrote:
>> 
>> I don't think RDDs are threadsafe.
>> More fundamentally however, why would you want to run RDD actions in 
>> parallel? The idea behind RDDs is to provide you with an abstraction for 
>> computing parallel operations on distributed data. Even if you were to call 
>> actions from several threads at once, the individual executors of your spark 
>> environment would still have to perform operations sequentially.
>> 
>> As an alternative, I would suggest to restructure your RDD transformations 
>> to compute the required results in one single operation.
>> 
>> On 15 January 2016 at 06:18, Jonathan Coveney <jcove...@gmail.com 
>> <mailto:jcove...@gmail.com>> wrote:
>> Threads
>> 
>> 
>> El viernes, 15 de enero de 2016, Kira <mennou...@gmail.com 
>> <mailto:mennou...@gmail.com>> escribió:
>> Hi,
>> 
>> Can we run *simultaneous* actions on the *same RDD* ?; if yes how can this
>> be done ?
>> 
>> Thank you,
>> Regards
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/simultaneous-actions-tp25977.html
>>  
>> <http://apache-spark-user-list.1001560.n3.nabble.com/simultaneous-actions-tp25977.html>
>> Sent from the Apache Spark User List mailing list archive at Nabble.com 
>> <http://nabble.com/>.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <>
>> For additional commands, e-mail: user-h...@spark.apache.org <>
>> 
>> 
> 
> 
> 
>

Re: simultaneous actions

Reply via email to