At 2014-12-02 22:01:20 -0800, Deep Pradhan <pradhandeep1...@gmail.com> wrote: > I have a graph which returns the following on doing graph.vertices > (1, 1.0) > (2, 1.0) > (3, 2.0) > (4, 2.0) > (5, 0.0) > > I want to group all the vertices with the same attribute together, like into > one RDD or something. I want all the vertices with same attribute to be > together.
You can do this by flipping the tuples so the values become the keys, then using one of the by-key functions in PairRDDFunctions: val a: RDD[(Int, Double)] = sc.parallelize(List( (1, 1.0), (2, 1.0), (3, 2.0), (4, 2.0), (5, 0.0))) val b: RDD[(Double, Int)] = a.map(kv => (kv._2, kv._1)) val c: RDD[(Double, Iterable[Int])] = b.groupByKey(numPartitions = 5) c.collect.foreach(println) // (0.0,CompactBuffer(5)) // (1.0,CompactBuffer(1, 2)) // (2.0,CompactBuffer(3, 4)) Ankur --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org