You could try increasing the level of parallelism (spark.default.parallelism) while creating the sparkContext
Thanks Best Regards On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi <antonym...@yahoo.com.invalid> wrote: > Hi, > > using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - > I can seen multiple python processes spawned on each nodemanager but from > some reason when running cartesian there is only single python process > running on each node. the task is indicating thousands of partitions so > don't understand why it is not running with higher parallelism. the > performance is obviously poor although other operation rocks. > > any idea how to improve this? > > thank you, > Antony. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >