You can do this much more simply, I think, with Scala's parallel
collections (try .par). There's nothing wrong with doing this, no.

Here, something is getting caught in your closure, maybe
unintentionally, that's not serializable. It's not directly related to
the parallelism.

On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
<aram.mkrtchyan...@gmail.com> wrote:
> Hi.
>
> I'm trying to trigger DataFrame's save method in parallel from my driver.
> For that purposes I use ExecutorService and Futures, here's my code:
>
>
> val futures = [1,2,3].map( t => pool.submit( new Runnable {
>
> override def run(): Unit = {
>     val commons = events.filter(_._1 == t).map(_._2.common)
>     saveAsParquetFile(sqlContext, commons, s"$t/common")
>     EventTypes.all.foreach { et =>
>         val eventData = events.filter(ev => ev._1 == t && ev._2.eventType ==
> et).map(_._2.data)
>         saveAsParquetFile(sqlContext, eventData, s"$t/$et")
>     }
> }
>
> }))
> futures.foreach(_.get)
>
> It throws "Task is not Serializable" exception. Is it legal to use threads
> in driver to trigger actions?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to