I diagnosed this problem today and found that it's because the GraphX custom serializers make an assumption that is violated by sort-based shuffle. I filed SPARK-3649 explaining the problem and submitted a PR to fix it [2].
The fix removes the custom serializers, which has a 10% performance penalty for PageRank since the custom serializers were written specifically to optimize PageRank. Other applications should see much less slowdown. Ankur [1] https://issues.apache.org/jira/browse/SPARK-3649 [2] https://github.com/apache/spark/pull/2503 --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org