I diagnosed this problem today and found that it's because the GraphX custom 
serializers make an assumption that is violated by sort-based shuffle. I filed 
SPARK-3649 explaining the problem and submitted a PR to fix it [2].

The fix removes the custom serializers, which has a 10% performance penalty for 
PageRank since the custom serializers were written specifically to optimize 
PageRank. Other applications should see much less slowdown.

Ankur

[1] https://issues.apache.org/jira/browse/SPARK-3649
[2] https://github.com/apache/spark/pull/2503

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to