What is the driver-side Future for? Are you trying to make the remote Spark workers execute more requests to your service concurrently? it's not clear from your messages whether it's something like a web service, or just local native code.
So the time spent in your processing -- whatever returns Double -- is mostly waiting for a blocking service to return? I assume the external service is not at capacity yet and can handle more concurrent requests, or else, there's no point in adding parallelism. First I'd figure out how many parallel requests the service can handle before starting to slow down; call it N. It won't help to make more than N requests in parallel. So first I'd make sure you really are not yet at that point. You can make more partitions with repartition(), to have at least N partitions. Then you want to make sure there are enough executors, with access to enough cores, to run N tasks concurrently on the cluster. That should maximize parallelism. You can indeed write remote functions that parallelize themselves with Future (not on the driver side) but I think ideally you get the parallelism from Spark, absent a reason not to. On Mon, Sep 8, 2014 at 4:30 PM, DrKhu <khudyakov....@gmail.com> wrote: > What if, when I traverse RDD, I need to calculate values in dataset by > calling external (blocking) service? How do you think that could be > achieved? > > val values: Future[RDD[Double]] = Future sequence tasks > > I've tried to create a list of Futures, but as RDD id not Traversable, > Future.sequence is not suitable. > > I just wonder, if anyone had such a problem, and how did you solve it? What > I'm trying to achieve is to get a parallelism on a single worker node, so I > can call that external service 3000 times per second. > > Probably, there is another solution, more suitable for spark, like having > multiple working nodes on single host. > > It's interesting to know, how do you cope with such a challenge? Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org