Below r my simple mapper, partitioner classes and the input file and the output displayed on Console at the end of the message:
My question is about the keys it prints in the console window highlighted in bold in the console output which looks like this: Here is the first few lines of the output in console: ... 13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720 13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680 key = 0 value = 10 10 token[0] = 10 token[1] = 10 Printing Result in Partitioner = 0 IntPair in Mapper = 10-10 key = 6 value = 20 200 token[0] = 20 token[1] = 200 Printing Result in Partitioner = 0 IntPair in Mapper = 20-200 Q1: I am confused how/where it is calculating/getting these values Key=0 & Key=6 and so on? Q2: After output of the first 2 lines it prints the output from the partitioner class: Printing Result in Partitioner = 0 Is this because its happening parallel y the mapper & the partitioner? Will really appreciate if someone can take a quick look and pour some light in understanding it. **** Mapper Class *** public class SecondarySortMapper extends Mapper<LongWritable, Text, IntPair, IntWritable> { private String [] tokens = null; private IntWritable ONE = new IntWritable(1); @Override public void map(LongWritable key, Text value, Context context) throws IOException , InterruptedException{ System.out.println("key = " + key.toString() + " value = " + value.toString()); if(value!=null){ tokens = value.toString().split("\\s+") ; System.out.println("token[0] = " + tokens[0] + " token[1] = " + tokens[1] ); ONE.set(Integer.parseInt(tokens[1])); IntPair ip = new IntPair(Integer.parseInt(tokens[0]), Integer.parseInt(tokens[1])); context.write(ip, ONE); System.out.println("IntPair in Mapper = " + ip.toString()); } } **** Partitioner class *** public class SecondarySortPartitioner extends Partitioner<IntPair, IntWritable> { @Override public int getPartition(IntPair key, IntWritable value, int numOfPartitions) { // TODO Auto-generated method stub int result = (key.getFirst().hashCode())%numOfPartitions; System.out.println("Printing Result in Partitioner = " + result); return result; } } *** input file *** 10 10 20 200 30 2500 40 400 50 500 60 1 10 10 30 2500 50 500 10 100 20 2000 30 25000 40 4000 50 5000 60 10 10 100 30 25000 50 5000 ********** Here is the output in the console **** ... 13/03/27 02:20:57 INFO mapred.MapTask: data buffer = 79691776/99614720 13/03/27 02:20:57 INFO mapred.MapTask: record buffer = 262144/327680 key = 0 value = 10 10 token[0] = 10 token[1] = 10 Printing Result in Partitioner = 0 IntPair in Mapper = 10-10 key = 6 value = 20 200 token[0] = 20 token[1] = 200 Printing Result in Partitioner = 0 IntPair in Mapper = 20-200 key = 13 value = 30 2500 token[0] = 30 token[1] = 2500 Printing Result in Partitioner = 0 IntPair in Mapper = 30-2500 key = 21 value = 40 400 token[0] = 40 token[1] = 400 Printing Result in Partitioner = 0 IntPair in Mapper = 40-400 key = 28 value = 50 500 token[0] = 50 token[1] = 500 Printing Result in Partitioner = 0 IntPair in Mapper = 50-500 key = 35 value = 60 1 token[0] = 60 token[1] = 1 Printing Result in Partitioner = 0 IntPair in Mapper = 60-1 key = 40 value = 10 10 token[0] = 10 token[1] = 10 Printing Result in Partitioner = 0 IntPair in Mapper = 10-10 key = 46 value = 30 2500 token[0] = 30 token[1] = 2500 Printing Result in Partitioner = 0 IntPair in Mapper = 30-2500 key = 54 value = 50 500 token[0] = 50 token[1] = 500 Printing Result in Partitioner = 0 IntPair in Mapper = 50-500 key = 61 value = 10 100 token[0] = 10 token[1] = 100 Printing Result in Partitioner = 0 IntPair in Mapper = 10-100 key = 68 value = 20 2000 token[0] = 20 token[1] = 2000 Printing Result in Partitioner = 0 IntPair in Mapper = 20-2000 key = 76 value = 30 25000 token[0] = 30 token[1] = 25000 Printing Result in Partitioner = 0 IntPair in Mapper = 30-25000 key = 85 value = 40 4000 token[0] = 40 token[1] = 4000 Printing Result in Partitioner = 0 IntPair in Mapper = 40-4000 key = 93 value = 50 5000 token[0] = 50 token[1] = 5000 Printing Result in Partitioner = 0 IntPair in Mapper = 50-5000 key = 101 value = 60 10 token[0] = 60 token[1] = 10 Printing Result in Partitioner = 0 IntPair in Mapper = 60-10 key = 107 value = 10 100 token[0] = 10 token[1] = 100 Printing Result in Partitioner = 0 IntPair in Mapper = 10-100 key = 114 value = 30 25000 token[0] = 30 token[1] = 25000 Printing Result in Partitioner = 0 IntPair in Mapper = 30-25000 key = 123 value = 50 5000 token[0] = 50 token[1] = 5000 Printing Result in Partitioner = 0 IntPair in Mapper = 50-5000 Thanks Sai