Re: cartesian on pyspark not paralleised

Akhil Das Sat, 06 Dec 2014 06:26:31 -0800

You could try increasing the level of parallelism
(spark.default.parallelism) while creating the sparkContext


Thanks
Best Regards

On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi <antonym...@yahoo.com.invalid>
wrote:

> Hi,
>
> using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel -
> I can seen multiple python processes spawned on each nodemanager but from
> some reason when running cartesian there is only single python process
> running on each node. the task is indicating thousands of partitions so
> don't understand why it is not running with higher parallelism. the
> performance is obviously poor although other operation rocks.
>
> any idea how to improve this?
>
> thank you,
> Antony.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: cartesian on pyspark not paralleised

Reply via email to