Hi,

I'm running a simple connected components code using GraphX (version 0.9.1)

My input comes from a HDFS text file partitioned to 400 parts. When I run the 
code on a single part or a small number of files (like 20) the code runs fine. 
As soon as I'm trying to read more files (more than 30) I'm getting an error 
and the job fails.
>From looking at the logs I see the following exception
                java.util.NoSuchElementException: End of stream
       at org.apache.spark.util.NextIterator.next(NextIterator.scala:83)
       at 
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
       at 
org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:52)
       at 
org.apache.spark.graphx.impl.RoutingTable$$anonfun$1.apply(RoutingTable.scala:51)
       at org.apache.spark.rdd.RDD$$anonfun$1.apply(RDD.scala:456)

>From searching the web, I see it's a known issue with GraphX
Here : https://github.com/apache/spark/pull/367
And here : https://github.com/apache/spark/pull/497

Are there some stable releases that include this fix? Should I clone the git 
repo and build it myself? How would you advise me to deal with this issue

Thanks,
Alex



Reply via email to