You could try increasing the level of parallelism
(spark.default.parallelism) while creating the sparkContext

Thanks
Best Regards

On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi <antonym...@yahoo.com.invalid>
wrote:

> Hi,
>
> using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel -
> I can seen multiple python processes spawned on each nodemanager but from
> some reason when running cartesian there is only single python process
> running on each node. the task is indicating thousands of partitions so
> don't understand why it is not running with higher parallelism. the
> performance is obviously poor although other operation rocks.
>
> any idea how to improve this?
>
> thank you,
> Antony.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to