I would key by those things that should be the same and then reduce by sum.
sc.parallelize(inputList) .map(x => (x._1, x._2.toLong, x._3.toLong)) // parse to numeric values from String .map(x => ((x._1, x._3), x._2)) // key by the name and final number field .reduceByKey(_+_) Andrew On Tue, Feb 11, 2014 at 7:07 AM, Adrian Mocanu <amoc...@verticalscope.com>wrote: > Hi > > Are there any examples on how to do any other operation apart from > counting in spark via map then reduceByKey. > > It’s pretty straight forward to do counts but how do I add in my own > function (say conditional sum based on tuple fields or moving average)? > > > > Here’s my count example so we have some code to work with > > > > val inputList= List( > ("name","1","11134"),("name","2","11134"),("name","1","11130"), > ("name2","1","11133") ) > > sc.parallelize( inputList ) > > .map(x => (x,1) ) > > .reduceByKey(sumTuples) > > .foreach(x=>println(x)) > > > > How would I add up field 2 from tuples which have fields “name” and the > last field the same. > > In my example the result I want is: > > "name","1+2","11134" > > “name","1","11130” > > “name2","1","11133” > > > > Thanks > > -A >