You can do this much more simply, I think, with Scala's parallel collections (try .par). There's nothing wrong with doing this, no.
Here, something is getting caught in your closure, maybe unintentionally, that's not serializable. It's not directly related to the parallelism. On Thu, Mar 26, 2015 at 3:54 PM, Aram Mkrtchyan <aram.mkrtchyan...@gmail.com> wrote: > Hi. > > I'm trying to trigger DataFrame's save method in parallel from my driver. > For that purposes I use ExecutorService and Futures, here's my code: > > > val futures = [1,2,3].map( t => pool.submit( new Runnable { > > override def run(): Unit = { > val commons = events.filter(_._1 == t).map(_._2.common) > saveAsParquetFile(sqlContext, commons, s"$t/common") > EventTypes.all.foreach { et => > val eventData = events.filter(ev => ev._1 == t && ev._2.eventType == > et).map(_._2.data) > saveAsParquetFile(sqlContext, eventData, s"$t/$et") > } > } > > })) > futures.foreach(_.get) > > It throws "Task is not Serializable" exception. Is it legal to use threads > in driver to trigger actions? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org