Re: no output written to HDFS

2012-08-31 Thread Håvard Wahl Kongsgård
For python streaming go with dumbo https://github.com/klbostee/dumbo/wiki or pipes with pydoop http://pydoop.sourceforge.net/docs/pipes -Håvard On Thu, Aug 30, 2012 at 5:52 AM, Periya.Data periya.d...@gmail.com wrote: Hi All, My Hadoop streaming job (in Python) runs to completion (both map

streaming command [Re: no output written to HDFS]

2012-08-31 Thread Periya.Data
plain linux machine, using the basic commands : cat $1 | python test2.py $2, it produces the expected output. *Observation*: If I do not specify the two files under - file option, then, I see no output written to HDFS..even though the output directory has empy part-files and SUCCESS directory. The 3

solved [Re: streaming command [Re: no output written to HDFS]]

2012-08-31 Thread Periya.Data
$1 | python test2.py $2, it produces the expected output. *Observation*: If I do not specify the two files under - file option, then, I see no output written to HDFS..even though the output directory has empy part-files and SUCCESS directory. The 3-part files are reasonable - as 3 mappers

Re: no output written to HDFS

2012-08-30 Thread Periya.Data
Hi Bertrand, No, I do not observe the same when I run using cat | map. I can see the output in STDOUT when I run my program. I do not have any reducer. In my command, I provide -D mapred.reduce.tasks=0. So, I expect the output of the mapper to be written directly to HDFS. Your suspicion

Re: no output written to HDFS

2012-08-30 Thread Periya.Data
This is interesting. I changed my command to: -mapper cat $1 | $GHU_HOME/test2.py $2 \ is producing output to HDFS. But, the output is not what I expected and is not the same as when I do cat | map on Linux. It is producing part-0, part-1 and part-2. I expected only one output file

Re: no output written to HDFS

2012-08-30 Thread Hemanth Yamijala
Hi, Do both input files contain data that needs to be processed by the mapper in the same fashion ? In which case, you could just put the input files under a directory in HDFS and provide that as input. The -input option does accept a directory as argument. Otherwise, can you please explain a

no output written to HDFS

2012-08-29 Thread Periya.Data
Hi All, My Hadoop streaming job (in Python) runs to completion (both map and reduce says 100% complete). But, when I look at the output directory in HDFS, the part files are empty. I do not know what might be causing this behavior. I understand that the percentages represent the records that

Re: no output written to HDFS

2012-08-29 Thread Bertrand Dechoux
Do you observe the same thing when running without Hadoop? (cat, map, sort and then reduce) Could you provide the counters of your job? You should be able to get them using the job tracker interface. The most probable answer without more information would be that your reducer do not output any