Any suggestions?
--------- Hello, I am having issue with partitioning data between mapper and reducers when the key is numeric. When I switch it to one character string it works fine, but I have more then 26 keys so looking to alternative way. My data look like: 10 \t comment10 \t data 20 \t comment20 \t data 30 \t comment30 \t data 40 \t comment40 \t data up to 250 The data is around 50 mln lines. hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-0.20.2+228-streaming.jar \ -D mapred.task.timeout=3600000 \ -D mapred.map.tasks=25 \ -D stream.non.zero.exit.is.failure=true \ -D mapred.reduce.tasks=25 \ -D mapred.output.compress=true \ -D mapred.text.key.partitioner.options=-k1,1n \ -D mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec \ -input "input" \ -output "output" \ -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \ -jobconf stream.map.output.field.separator=. \ -jobconf stream.num.map.output.key.fields=1 \ -jobconf map.output.key.field.separator=\t \ -jobconf num.key.fields.for.partition=1 \ -mapper " cat " \ -reducer " cat " other issue I have stream.map.output.field.separator when I put it as a tab it adds space in my data when keys are bigger or eq to 100 Any suggestion how to fix this?