The second one sounds reasonable, I think. On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot <o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > Let's assume I have a complex workflow of more than 10 datasources as input > - 20 computations (some creating intermediary datasets and some merging > everything for the final computation) - some taking on average 1 minute to > complete and some taking more than 30 minutes. > > What would be for you the best strategy to port this to Apache Spark ? > > Transform the whole flow into a Spark Job (PySpark or Scala) > Transform only part of the flow (the heavy lifting ~30 min parts) using the > same language (PySpark) > Transform only part of the flow and pipe the rest from Scala to Python > > Regards, > > Olivier.
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org