Re: Error while using the Hadoop Streaming

Pramod N Sat, 25 May 2013 10:33:11 -0700

One of the idea behind Hadoop is to take the code to data(errr!!, data is
distributed either way with a DFS support, so now distribute the code, I
hope i'm stating it right).
Same holds good with streaming. You can make sure that the mapper script is
present on all the Nodes in a particular path.
The other option is using the -file option that ships the file(in this case
mapper and reduces script) to HDFS.


Probably the better way of saying what the exec env is is probably by using
#!/usr/bin/env python
at the beginning of your script than in the mapper option(Its also probably
a personal choice)

http://hadoop.apache.org/docs/stable/streaming.pdf this has complete
information on streaming.

The input files in HDFS - Correct
Mapper and reducer local  - Ok, as you are using -file option


Hope this helps.




Pramod N <http://atmachinelearner.blogspot.in>
Bruce Wayne of web
@machinelearner <https://twitter.com/machinelearner>

--


On Fri, May 24, 2013 at 11:48 PM, Adamantios Corais <
[email protected]> wrote:

> That's the point. I think I have chosen them right, but how could I
> double-check it? As you see files "mapper.py and reducer.py" are on my
> laptop whereas input file is on the HDFS. Does this sounds ok to you?
>
>
>
>
> On Fri, May 24, 2013 at 8:10 PM, Jitendra Yadav <
> [email protected]> wrote:
>
>> Hi,
>>
>> In your first mail you were using "/usr/bin/python" binary file just
>> after "- mapper", I don't think we need python executable to run this
>> example.
>>
>> Make sure that you are using correct path of you files "mapper.py and
>> reducer.py"  while executing.
>>
>>
>> ~Thanks
>>
>>
>>
>> On Fri, May 24, 2013 at 11:31 PM, Adamantios Corais <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> Thanks a lot for your response.
>>>
>>> Unfortunately, I run into the same problem though.
>>>
>>> What do you mean by "python binary"? This is what I have in the very
>>> first line of both scripts: #!/usr/bin/python
>>>
>>> Any ideas?
>>>
>>>
>>> On Fri, May 24, 2013 at 7:41 PM, Jitendra Yadav <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have run Michael's python map reduce example several times without
>>>> any issue.
>>>>
>>>> I think this issue is related to your file path 'mapper.py'.  you are
>>>> using python binary?
>>>>
>>>> try this,
>>>>
>>>> hadoop jar
>>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/hadoop-streaming-1.1.2.jar
>>>> \
>>>>  -input /user/yyy/20417-8.txt \
>>>> -output /user/yyy/output \
>>>>  -file /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \
>>>> -mapper /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \
>>>>  -file /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py \
>>>> -reducer /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py
>>>>
>>>>
>>>> Thanks~
>>>>
>>>> On Fri, May 24, 2013 at 10:12 PM, Adamantios Corais <
>>>> [email protected]> wrote:
>>>>
>>>>> I tried this nice example:
>>>>> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
>>>>>
>>>>> The python scripts work pretty fine from my laptop (through terminal),
>>>>> but they don't when I execute them on the CDH3 (Pseudo-Distributed Mode).
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> hadoop jar
>>>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/hadoop-streaming-1.1.2.jar
>>>>> \
>>>>> -input /user/yyy/20417-8.txt \
>>>>> -output /user/yyy/output \
>>>>> -file /usr/bin/python
>>>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \
>>>>> -mapper /usr/bin/python
>>>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \
>>>>> -file /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py \
>>>>> -reducer /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py
>>>>>
>>>>> -----------------------
>>>>>
>>>>> 2013-05-24 18:21:12,232 INFO org.apache.hadoop.util.NativeCodeLoader:
>>>>> Loaded the native-hadoop library
>>>>> 2013-05-24 18:21:12,569 INFO
>>>>> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
>>>>> symlink:
>>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/jars/job.jar
>>>>> <-
>>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/attempt_201305160832_0020_m_000000_0/work/job.jar
>>>>> 2013-05-24 18:21:12,586 INFO
>>>>> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
>>>>> symlink:
>>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/jars/.job.jar.crc
>>>>> <-
>>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/attempt_201305160832_0020_m_000000_0/work/.job.jar.crc
>>>>> 2013-05-24 18:21:12,717 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>> Initializing JVM Metrics with processName=MAP, sessionId=
>>>>> 2013-05-24 18:21:13,062 INFO org.apache.hadoop.util.ProcessTree:
>>>>> setsid exited with exit code 0
>>>>> 2013-05-24 18:21:13,087 INFO org.apache.hadoop.mapred.Task:  Using
>>>>> ResourceCalculatorPlugin :
>>>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1358f03
>>>>> 2013-05-24 18:21:13,452 WARN
>>>>> org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is
>>>>> available
>>>>> 2013-05-24 18:21:13,452 INFO
>>>>> org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library
>>>>> loaded
>>>>> 2013-05-24 18:21:13,464 INFO org.apache.hadoop.mapred.MapTask:
>>>>> numReduceTasks: 1
>>>>> 2013-05-24 18:21:13,477 INFO org.apache.hadoop.mapred.MapTask:
>>>>> io.sort.mb = 100
>>>>> 2013-05-24 18:21:13,635 INFO org.apache.hadoop.mapred.MapTask: data
>>>>> buffer = 79691776/99614720
>>>>> 2013-05-24 18:21:13,635 INFO org.apache.hadoop.mapred.MapTask: record
>>>>> buffer = 262144/327680
>>>>> 2013-05-24 18:21:13,724 INFO org.mortbay.log: Logging to
>>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>>>> org.mortbay.log.Slf4jLog
>>>>> 2013-05-24 18:21:13,733 INFO org.apache.hadoop.streaming.PipeMapRed:
>>>>> PipeMapRed exec [mapper.py]
>>>>> 2013-05-24 18:21:13,783 ERROR org.apache.hadoop.streaming.PipeMapRed:
>>>>> configuration exception
>>>>> java.io.IOException: Cannot run program "mapper.py":
>>>>> java.io.IOException: error=2, No such file or directory
>>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:260)
>>>>> Caused by: java.io.IOException: java.io.IOException: error=2, No such
>>>>> file or directory
>>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>>>>     ... 24 more
>>>>> 2013-05-24 18:21:13,869 INFO
>>>>> org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater
>>>>> with mapRetainSize=-1 and reduceRetainSize=-1
>>>>> 2013-05-24 18:21:13,871 WARN org.apache.hadoop.mapred.Child: Error
>>>>> running child
>>>>> java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
>>>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>     at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:260)
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>>     ... 9 more
>>>>> Caused by: java.lang.RuntimeException: Error in configuring object
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>>>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>>>>     ... 14 more
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at
>>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>>>>     ... 17 more
>>>>> Caused by: java.lang.RuntimeException: configuration exception
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
>>>>>     ... 22 more
>>>>> Caused by: java.io.IOException: Cannot run program "mapper.py":
>>>>> java.io.IOException: error=2, No such file or directory
>>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>>>>>     at
>>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
>>>>>     ... 23 more
>>>>> Caused by: java.io.IOException: java.io.IOException: error=2, No such
>>>>> file or directory
>>>>>     at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>>>>>     at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>>>>>     at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>>>>>     ... 24 more
>>>>> 2013-05-24 18:21:13,879 INFO org.apache.hadoop.mapred.Task: Runnning
>>>>> cleanup for the task
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Error while using the Hadoop Streaming

Reply via email to