I think your questions revolve around the reduce function here, which is a function of 2 arguments returning 1, whereas in a Reducer, you implement a function of many-to-many.
This API is simpler if less general. Here you provide an associative operation that can reduce any 2 values down to 1 (e.g. two integers sum to one). This is used to reduce all values for each key to 1. It's not necessary to provide an N-to-1 function since it can be accomplished with a 2-to-1 function. Here, you can't emit multiple values for one key. The result are (key, reduced value) from each (key, bunch of values). The Mapper and Reducer in classic Hadoop MapReduce were actually both quite similar (just that one takes a collection of values rather than single value per key) and let you implement a lot of patterns. In a way that's good, in a way that was wasteful and complex. You can still reproduce what Mappers and Reducers do, but the method in Spark is mapPartitions, possibly paired with groupByKey. These are the most general operations you might consider, and I'm not saying you *should* emulate MapReduce this way in Spark. In fact it's unlikely to be efficient. But it is possible. On Sat, Aug 2, 2014 at 9:55 AM, Anil Karamchandani <[email protected]> wrote: > Hi, > > I am a complete newbie to spark and map reduce frameworks and have a basic > question on the reduce function. I was working on the word count example and > was stuck at the reduce stage where the sum happens. > > I am trying to understand the working of the reducebykey in Spark using java > as the programming language. > > Say if I have a sentence "I am who I am" I break the sentence into words and > store as list [I, am, who, I, am] > > now this function assigns 1 to each word > > JavaPairRDD ones = words.mapToPair(new PairFunction() { @Override public > Tuple2 call(String s) { return new Tuple2(s, 1); } }); > > so the output is something like this > > (I,1) (am,1) (who,1) (I,1) (am,1) > > Now If I have 3 reducers running, each reducer will get a key and the values > associated to that key > > reducer 1 : (I,1) (I,1) > > reducer 2 : (am,1) (am,1) > > reducer 3 : (who,1) > > I wanted to know > > a. what exactly happens here in the code below. > > b.What are the parameters new Function2 > > c. Basically how the JavaPairRDD is formed. > > JavaPairRDD counts = ones.reduceByKey(new Function2() { @Override public > Integer call(Integer i1, Integer i2) { return i1 + i2; } }); > > thanks ! > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
