You can do this much more simply, I think, with Scala's parallel
collections (try .par). There's nothing wrong with doing this, no.

Here, something is getting caught in your closure, maybe
unintentionally, that's not serializable. It's not directly related to
the parallelism.

On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan
<[email protected]> wrote:
> Hi.
>
> I'm trying to trigger DataFrame's save method in parallel from my driver.
> For that purposes I use ExecutorService and Futures, here's my code:
>
>
> val futures = [1,2,3].map( t => pool.submit( new Runnable {
>
> override def run(): Unit = {
>     val commons = events.filter(_._1 == t).map(_._2.common)
>     saveAsParquetFile(sqlContext, commons, s"$t/common")
>     EventTypes.all.foreach { et =>
>         val eventData = events.filter(ev => ev._1 == t && ev._2.eventType ==
> et).map(_._2.data)
>         saveAsParquetFile(sqlContext, eventData, s"$t/$et")
>     }
> }
>
> }))
> futures.foreach(_.get)
>
> It throws "Task is not Serializable" exception. Is it legal to use threads
> in driver to trigger actions?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to