I diagnosed this problem today and found that it's because the GraphX custom
serializers make an assumption that is violated by sort-based shuffle. I filed
SPARK-3649 explaining the problem and submitted a PR to fix it [2].
The fix removes the custom serializers, which has a 10% performance pena
@ankur - I have also seen this recently. Is there a patch available for this
issue?
(in my recent experience on non-graphx apps, sort based shuffle looks better
while dealing with memory pressure...)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-
On Wed, Sep 10, 2014 at 2:00 PM, Jeffrey Picard wrote:
> After rebuilding from the master branch this morning, I’ve started to see
> these errors that I’ve never gotten before while running connected
> components. Anyone seen this before?
> [...]
> at
> org.apache.spark.shuffle.sort.SortS
collection.ExternalSorter: Thread 60 spilling in-memory
batch of 1020 MB to disk (1 spill so far)
14/09/10 20:39:15 ERROR executor.Executor: Exception in task 275.0 in stage 3.0
(TID 994)
java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
at