Re: more complex analytics

Andrew Ash Tue, 11 Feb 2014 09:30:15 -0800

I would key by those things that should be the same and then reduce by sum.


sc.parallelize(inputList)
.map(x => (x._1, x._2.toLong, x._3.toLong)) // parse to numeric values from
String
.map(x => ((x._1, x._3), x._2)) // key by the name and final number field
.reduceByKey(_+_)

Andrew


On Tue, Feb 11, 2014 at 7:07 AM, Adrian Mocanu <amoc...@verticalscope.com>wrote:

>  Hi
>
> Are there any examples on how to do any other operation apart from
> counting in spark via map then reduceByKey.
>
> It’s pretty straight forward to do counts but how do I add in my own
> function (say conditional sum based on tuple fields or moving average)?
>
>
>
> Here’s my count example so we have some code to work with
>
>
>
> val inputList= List(
> ("name","1","11134"),("name","2","11134"),("name","1","11130"),
> ("name2","1","11133") )
>
> sc.parallelize( inputList )
>
> .map(x => (x,1) )
>
> .reduceByKey(sumTuples)
>
> .foreach(x=>println(x))
>
>
>
> How would I add up field 2 from tuples which have fields “name” and the
> last field the same.
>
> In my example the result I want is:
>
> "name","1+2","11134"
>
> “name","1","11130”
>
> “name2","1","11133”
>
>
>
> Thanks
>
> -A
>

Re: more complex analytics

Reply via email to