For python streaming go with dumbo https://github.com/klbostee/dumbo/wiki
or pipes with pydoop http://pydoop.sourceforge.net/docs/pipes
-Håvard
On Thu, Aug 30, 2012 at 5:52 AM, Periya.Data periya.d...@gmail.com wrote:
Hi All,
My Hadoop streaming job (in Python) runs to completion (both map
plain linux machine, using the basic
commands :
cat $1 | python test2.py $2,
it produces the expected output.
*Observation*: If I do not specify the two files under - file option,
then, I see no output written to HDFS..even though the output directory has
empy part-files and SUCCESS directory. The 3
$1 | python test2.py $2,
it produces the expected output.
*Observation*: If I do not specify the two files under - file option,
then, I see no output written to HDFS..even though the output directory has
empy part-files and SUCCESS directory. The 3-part files are reasonable - as
3 mappers
Hi Bertrand,
No, I do not observe the same when I run using cat | map. I can see the
output in STDOUT when I run my program.
I do not have any reducer. In my command, I provide
-D mapred.reduce.tasks=0. So, I expect the output of the mapper to be
written directly to HDFS.
Your suspicion
This is interesting. I changed my command to:
-mapper cat $1 | $GHU_HOME/test2.py $2 \
is producing output to HDFS. But, the output is not what I expected and is
not the same as when I do cat | map on Linux. It is producing
part-0, part-1 and part-2. I expected only one output file
Hi,
Do both input files contain data that needs to be processed by the
mapper in the same fashion ? In which case, you could just put the
input files under a directory in HDFS and provide that as input. The
-input option does accept a directory as argument.
Otherwise, can you please explain a
Hi All,
My Hadoop streaming job (in Python) runs to completion (both map and
reduce says 100% complete). But, when I look at the output directory in
HDFS, the part files are empty. I do not know what might be causing this
behavior. I understand that the percentages represent the records that
Do you observe the same thing when running without Hadoop? (cat, map, sort
and then reduce)
Could you provide the counters of your job? You should be able to get them
using the job tracker interface.
The most probable answer without more information would be that your
reducer do not output any