I am attempting to write a mapreduce job on a graph object to take an edge list and return a new edge list. Unfortunately I find that the current function is
def reduce(f: (T, T) => T): T not def reduce(f: (T1, T2) => T3): T I see this because the following 2 commands give different results for the final number, which should be the same (tempMappedRDD is a MappedRDD of the form (Edge,1), and I found the the A and B here are (1,4) and (7,3) ) tempMappedRDD.reduce( (A,B) => (Edge(A._1.srcId, A._1.dstId, A._1.dstId.toInt), 1) ) // (Edge(1,4,4),1) tempMappedRDD.reduce( (A,B) => (Edge(A._1.srcId, B._1.dstId, A._1.dstId.toInt), 1) ) // (Edge(1,3,3),1) why is the 3rd digit above a '3' in the second line, and not a '4'? Does it have something to do with toInt? the really weird thing is that it is only for A, since the following commands work correctly: tempMappedRDD.reduce( (A,B) => (Edge(B._1.srcId, B._1.dstId, B._1.dstId.toInt), 1) ) // (Edge(7,3,3),1) tempMappedRDD.reduce( (A,B) => (Edge(B._1.srcId, A._1.dstId, B._1.dstId.toInt), 1) ) // (Edge(7,4,3),1) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/reduce-only-removes-duplicates-cannot-be-arbitrary-function-tp6606.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.