Hello all, I know that the tab is default input separator for fields :
stream.map.output.field.separator stream.reduce.input.field.separator stream.reduce.output.field.separator mapreduce.textoutputformat.separator but if i try to write the generic parser option : stream.map.output.field.separator=\t (or) stream.map.output.field.separator="\t" to test how hadoop parses white space characters like "\t,\n" when used as separators. I observed that hadoop reads it as \t character but not " " tab space itself. I checked it by printing each line in reducer (python) as it reads using : sys.stdout.write(str(line)) My mapper emits key/value pairs as : key value1 value2 using print (key,value1,value2,sep='\t',end='\n') command. So I expected my reducer to read each line as : key value1 value2 too, but instead sys.stdout.write(str(line)) printed : key value1 value2 \\with trailing space >From Hadoop streaming - remove trailing tab from reducer output <http://stackoverflow.com/questions/18133290/hadoop-streaming-remove-trailing-tab-from-reducer-output>, I understood that the trailing space is due to mapreduce.textoutputformat.separator not being set and left as default. So, this confirmed my assumption that hadoop considered my total map output : key value1 value2 as key and value as empty Text object since it read the separator from stream.map.output.field.separator=\t as "\t" character instead of "" tab space itself. Please help me understand this behavior and how can I use \t as a separator if I want to? Thanks & Regards, Anvesh R