Re: Best strategy for Pandas -> Spark

Davies Liu Mon, 01 Jun 2015 17:41:09 -0700

The second one sounds reasonable, I think.

On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot
<o.girar...@lateral-thoughts.com> wrote:
> Hi everyone,
> Let's assume I have a complex workflow of more than 10 datasources as input
> - 20 computations (some creating intermediary datasets and some merging
> everything for the final computation) - some taking on average 1 minute to
> complete and some taking more than 30 minutes.
>
> What would be for you the best strategy to port this to Apache Spark ?
>
> Transform the whole flow into a Spark Job (PySpark or Scala)
> Transform only part of the flow (the heavy lifting ~30 min parts) using the
> same language (PySpark)
> Transform only part of the flow and pipe the rest from Scala to Python
>
> Regards,
>
> Olivier.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Best strategy for Pandas -> Spark

Reply via email to