This is caused by https://issues.apache.org/jira/browse/SPARK-1188. I think the fix will be in the next release. But until then, do:
g.edges.map(_.copy()).distinct.count On Wed, Apr 23, 2014 at 2:26 AM, Ryan Compton <compton.r...@gmail.com>wrote: > Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/tttt.txt > > This code: > > println(g.numEdges) > println(g.numVertices) > println(g.edges.distinct().count()) > > gave me > > 10000 > 9294 > 2 > > > > On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave <ankurd...@gmail.com> wrote: > > I wasn't able to reproduce this with a small test file, but I did change > the > > file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to > take > > the third column rather than the second? > > > > If so, would you mind posting a larger sample of the file, or even the > whole > > file if possible? > > > > Here's the test that succeeded: > > > > test("graph.edges.distinct.count") { > > withSpark { sc => > > val edgeFullStrRDD: RDD[String] = sc.parallelize(List( > > "394365859\t136153151", "589404147\t1361045425")) > > val edgeTupRDD = edgeFullStrRDD.map(x => x.split("\t")) > > .map(x => (x(0).toLong, x(1).toLong)) > > val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123, > > uniqueEdges = Option(CanonicalRandomVertexCut)) > > assert(edgeTupRDD.distinct.count() === 2) > > assert(g.numEdges === 2) > > assert(g.edges.distinct.count() === 2) > > } > > } > > > > Ankur >