Simultaneous action works on cluster fine if they are independent...on local I never paid attention but the code path should be similar... On Jan 18, 2016 8:00 AM, "Koert Kuipers" <ko...@tresata.com> wrote:
> stacktrace? details? > > On Mon, Jan 18, 2016 at 5:58 AM, Mennour Rostom <mennou...@gmail.com> > wrote: > >> Hi, >> >> I am running my app in a single machine first before moving it in the >> cluster; actually simultaneous actions are not working for me now; is this >> comming from the fact that I am using a single machine ? yet I am using >> FAIR scheduler. >> >> 2016-01-17 21:23 GMT+01:00 Mark Hamstra <m...@clearstorydata.com>: >> >>> It can be far more than that (e.g. >>> https://issues.apache.org/jira/browse/SPARK-11838), and is generally >>> either unrecognized or a greatly under-appreciated and underused feature of >>> Spark. >>> >>> On Sun, Jan 17, 2016 at 12:20 PM, Koert Kuipers <ko...@tresata.com> >>> wrote: >>> >>>> the re-use of shuffle files is always a nice surprise to me >>>> >>>> On Sun, Jan 17, 2016 at 3:17 PM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> >>>>> Same SparkContext means same pool of Workers. It's up to the >>>>> Scheduler, not the SparkContext, whether the exact same Workers or >>>>> Executors will be used to calculate simultaneous actions against the same >>>>> RDD. It is likely that many of the same Workers and Executors will be >>>>> used >>>>> as the Scheduler tries to preserve data locality, but that is not >>>>> guaranteed. In fact, what is most likely to happen is that the shared >>>>> Stages and Tasks being calculated for the simultaneous actions will not >>>>> actually be run at exactly the same time, which means that shuffle files >>>>> produced for one action will be reused by the other(s), and repeated >>>>> calculations will be avoided even without explicitly caching/persisting >>>>> the >>>>> RDD. >>>>> >>>>> On Sun, Jan 17, 2016 at 8:06 AM, Koert Kuipers <ko...@tresata.com> >>>>> wrote: >>>>> >>>>>> Same rdd means same sparkcontext means same workers >>>>>> >>>>>> Cache/persist the rdd to avoid repeated jobs >>>>>> On Jan 17, 2016 5:21 AM, "Mennour Rostom" <mennou...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Thank you all for your answers, >>>>>>> >>>>>>> If I correctly understand, actions (in my case foreach) can be run >>>>>>> concurrently and simultaneously on the SAME rdd, (which is logical >>>>>>> because >>>>>>> they are read only object). however, I want to know if the same workers >>>>>>> are >>>>>>> used for the concurrent analysis ? >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> 2016-01-15 21:11 GMT+01:00 Jakob Odersky <joder...@gmail.com>: >>>>>>> >>>>>>>> I stand corrected. How considerable are the benefits though? Will >>>>>>>> the scheduler be able to dispatch jobs from both actions >>>>>>>> simultaneously (or >>>>>>>> on a when-workers-become-available basis)? >>>>>>>> >>>>>>>> On 15 January 2016 at 11:44, Koert Kuipers <ko...@tresata.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> we run multiple actions on the same (cached) rdd all the time, i >>>>>>>>> guess in different threads indeed (its in akka) >>>>>>>>> >>>>>>>>> On Fri, Jan 15, 2016 at 2:40 PM, Matei Zaharia < >>>>>>>>> matei.zaha...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> RDDs actually are thread-safe, and quite a few applications use >>>>>>>>>> them this way, e.g. the JDBC server. >>>>>>>>>> >>>>>>>>>> Matei >>>>>>>>>> >>>>>>>>>> On Jan 15, 2016, at 2:10 PM, Jakob Odersky <joder...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I don't think RDDs are threadsafe. >>>>>>>>>> More fundamentally however, why would you want to run RDD actions >>>>>>>>>> in parallel? The idea behind RDDs is to provide you with an >>>>>>>>>> abstraction for >>>>>>>>>> computing parallel operations on distributed data. Even if you were >>>>>>>>>> to call >>>>>>>>>> actions from several threads at once, the individual executors of >>>>>>>>>> your >>>>>>>>>> spark environment would still have to perform operations >>>>>>>>>> sequentially. >>>>>>>>>> >>>>>>>>>> As an alternative, I would suggest to restructure your RDD >>>>>>>>>> transformations to compute the required results in one single >>>>>>>>>> operation. >>>>>>>>>> >>>>>>>>>> On 15 January 2016 at 06:18, Jonathan Coveney <jcove...@gmail.com >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Threads >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> El viernes, 15 de enero de 2016, Kira <mennou...@gmail.com> >>>>>>>>>>> escribió: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Can we run *simultaneous* actions on the *same RDD* ?; if yes >>>>>>>>>>>> how can this >>>>>>>>>>>> be done ? >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> Regards >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> View this message in context: >>>>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/simultaneous-actions-tp25977.html >>>>>>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>>>>>> Nabble.com <http://nabble.com>. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >