This is caused by https://issues.apache.org/jira/browse/SPARK-1188. I think
the fix will be in the next release. But until then, do:

    g.edges.map(_.copy()).distinct.count



On Wed, Apr 23, 2014 at 2:26 AM, Ryan Compton <compton.r...@gmail.com>wrote:

> Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/tttt.txt
>
> This code:
>
>     println(g.numEdges)
>     println(g.numVertices)
>     println(g.edges.distinct().count())
>
> gave me
>
> 10000
> 9294
> 2
>
>
>
> On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave <ankurd...@gmail.com> wrote:
> > I wasn't able to reproduce this with a small test file, but I did change
> the
> > file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to
> take
> > the third column rather than the second?
> >
> > If so, would you mind posting a larger sample of the file, or even the
> whole
> > file if possible?
> >
> > Here's the test that succeeded:
> >
> >   test("graph.edges.distinct.count") {
> >     withSpark { sc =>
> >       val edgeFullStrRDD: RDD[String] = sc.parallelize(List(
> >         "394365859\t136153151", "589404147\t1361045425"))
> >       val edgeTupRDD = edgeFullStrRDD.map(x => x.split("\t"))
> >         .map(x => (x(0).toLong, x(1).toLong))
> >       val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123,
> >         uniqueEdges = Option(CanonicalRandomVertexCut))
> >       assert(edgeTupRDD.distinct.count() === 2)
> >       assert(g.numEdges === 2)
> >       assert(g.edges.distinct.count() === 2)
> >     }
> >   }
> >
> > Ankur
>

Reply via email to