hi, i am learning hadoop and currently doing python map reduce tutorial. i am trying to understand the difference of having a map and reduce files. i am assumingwhen we lunch the scripts.The mapper.py script goes to all the machines at the same time and all start printing at the same time, and then the reducer goes to the reducer jobs and reads the lines what is coming from the jobs in no particular order? 1 can i just do a script that -get the file put it in a temp file and then work with it? (i guess this defeat the hole purposes of hadoop right?) 2 when working with a map script do i always need to print as key, value? or i can print what ever i want? and in what order does that comes? if i read all the files of a folder like the tutorial say, are they been read in a sequential order by all the workers?can i make the mapper just print the lines of the file, and let the reducer do the logic of what i want to accomplish?
Writing An Hadoop MapReduce Program In Python - Michael G. Noll | | | | | | | | | | | Writing An Hadoop MapReduce Program In Python - Michael G. Noll Por Michael G. Noll How to write an Hadoop MapReduce program in Python with the Hadoop Streaming API | | | | following this tutorial, i found the way of getting the information was making a directory like this.the mapper.pyimport sys for i in sys.stdin: line = i.strip() words = line.split() for word in words: print word + "\t" + str(1)the reducer.pyimport sys dic_words = {} for i in sys.stdin: line = i.strip() word, one_value = line.split("\t") word_value = dic_words.get(word, 0) dic_words[word] = word_value + 1 for key, value in dic_words.items(): print key, str(value) when i test it against a file works, or just testing it locally works too.something easy. echo "bla ble bli bla" | python mapper.py | sort -k1,1 | python reducer.pyand i do getbla 2ble 1bli 1 (not sure why we need the sort, i guess that emulates how hadoops works? maybe hadoop mappers run first and then they return a dictionary that the reducer can read?) thanks guys, i know there are weird question =(