One of the idea behind Hadoop is to take the code to data(errr!!, data is distributed either way with a DFS support, so now distribute the code, I hope i'm stating it right). Same holds good with streaming. You can make sure that the mapper script is present on all the Nodes in a particular path. The other option is using the -file option that ships the file(in this case mapper and reduces script) to HDFS.
Probably the better way of saying what the exec env is is probably by using #!/usr/bin/env python at the beginning of your script than in the mapper option(Its also probably a personal choice) http://hadoop.apache.org/docs/stable/streaming.pdf this has complete information on streaming. The input files in HDFS - Correct Mapper and reducer local - Ok, as you are using -file option Hope this helps. Pramod N <http://atmachinelearner.blogspot.in> Bruce Wayne of web @machinelearner <https://twitter.com/machinelearner> -- On Fri, May 24, 2013 at 11:48 PM, Adamantios Corais < [email protected]> wrote: > That's the point. I think I have chosen them right, but how could I > double-check it? As you see files "mapper.py and reducer.py" are on my > laptop whereas input file is on the HDFS. Does this sounds ok to you? > > > > > On Fri, May 24, 2013 at 8:10 PM, Jitendra Yadav < > [email protected]> wrote: > >> Hi, >> >> In your first mail you were using "/usr/bin/python" binary file just >> after "- mapper", I don't think we need python executable to run this >> example. >> >> Make sure that you are using correct path of you files "mapper.py and >> reducer.py" while executing. >> >> >> ~Thanks >> >> >> >> On Fri, May 24, 2013 at 11:31 PM, Adamantios Corais < >> [email protected]> wrote: >> >>> Hi, >>> >>> Thanks a lot for your response. >>> >>> Unfortunately, I run into the same problem though. >>> >>> What do you mean by "python binary"? This is what I have in the very >>> first line of both scripts: #!/usr/bin/python >>> >>> Any ideas? >>> >>> >>> On Fri, May 24, 2013 at 7:41 PM, Jitendra Yadav < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> I have run Michael's python map reduce example several times without >>>> any issue. >>>> >>>> I think this issue is related to your file path 'mapper.py'. you are >>>> using python binary? >>>> >>>> try this, >>>> >>>> hadoop jar >>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/hadoop-streaming-1.1.2.jar >>>> \ >>>> -input /user/yyy/20417-8.txt \ >>>> -output /user/yyy/output \ >>>> -file /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \ >>>> -mapper /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \ >>>> -file /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py \ >>>> -reducer /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py >>>> >>>> >>>> Thanks~ >>>> >>>> On Fri, May 24, 2013 at 10:12 PM, Adamantios Corais < >>>> [email protected]> wrote: >>>> >>>>> I tried this nice example: >>>>> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ >>>>> >>>>> The python scripts work pretty fine from my laptop (through terminal), >>>>> but they don't when I execute them on the CDH3 (Pseudo-Distributed Mode). >>>>> >>>>> Any ideas? >>>>> >>>>> hadoop jar >>>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/hadoop-streaming-1.1.2.jar >>>>> \ >>>>> -input /user/yyy/20417-8.txt \ >>>>> -output /user/yyy/output \ >>>>> -file /usr/bin/python >>>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \ >>>>> -mapper /usr/bin/python >>>>> /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/mapper.py \ >>>>> -file /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py \ >>>>> -reducer /home/yyy/Dropbox/Private/xxx/Projects/task_week_22/reducer.py >>>>> >>>>> ----------------------- >>>>> >>>>> 2013-05-24 18:21:12,232 INFO org.apache.hadoop.util.NativeCodeLoader: >>>>> Loaded the native-hadoop library >>>>> 2013-05-24 18:21:12,569 INFO >>>>> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating >>>>> symlink: >>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/jars/job.jar >>>>> <- >>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/attempt_201305160832_0020_m_000000_0/work/job.jar >>>>> 2013-05-24 18:21:12,586 INFO >>>>> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating >>>>> symlink: >>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/jars/.job.jar.crc >>>>> <- >>>>> /var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/yyy/jobcache/job_201305160832_0020/attempt_201305160832_0020_m_000000_0/work/.job.jar.crc >>>>> 2013-05-24 18:21:12,717 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >>>>> Initializing JVM Metrics with processName=MAP, sessionId= >>>>> 2013-05-24 18:21:13,062 INFO org.apache.hadoop.util.ProcessTree: >>>>> setsid exited with exit code 0 >>>>> 2013-05-24 18:21:13,087 INFO org.apache.hadoop.mapred.Task: Using >>>>> ResourceCalculatorPlugin : >>>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1358f03 >>>>> 2013-05-24 18:21:13,452 WARN >>>>> org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is >>>>> available >>>>> 2013-05-24 18:21:13,452 INFO >>>>> org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library >>>>> loaded >>>>> 2013-05-24 18:21:13,464 INFO org.apache.hadoop.mapred.MapTask: >>>>> numReduceTasks: 1 >>>>> 2013-05-24 18:21:13,477 INFO org.apache.hadoop.mapred.MapTask: >>>>> io.sort.mb = 100 >>>>> 2013-05-24 18:21:13,635 INFO org.apache.hadoop.mapred.MapTask: data >>>>> buffer = 79691776/99614720 >>>>> 2013-05-24 18:21:13,635 INFO org.apache.hadoop.mapred.MapTask: record >>>>> buffer = 262144/327680 >>>>> 2013-05-24 18:21:13,724 INFO org.mortbay.log: Logging to >>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via >>>>> org.mortbay.log.Slf4jLog >>>>> 2013-05-24 18:21:13,733 INFO org.apache.hadoop.streaming.PipeMapRed: >>>>> PipeMapRed exec [mapper.py] >>>>> 2013-05-24 18:21:13,783 ERROR org.apache.hadoop.streaming.PipeMapRed: >>>>> configuration exception >>>>> java.io.IOException: Cannot run program "mapper.py": >>>>> java.io.IOException: error=2, No such file or directory >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) >>>>> at >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) >>>>> at >>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) >>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:266) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:260) >>>>> Caused by: java.io.IOException: java.io.IOException: error=2, No such >>>>> file or directory >>>>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) >>>>> at java.lang.ProcessImpl.start(ProcessImpl.java:65) >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) >>>>> ... 24 more >>>>> 2013-05-24 18:21:13,869 INFO >>>>> org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater >>>>> with mapRetainSize=-1 and reduceRetainSize=-1 >>>>> 2013-05-24 18:21:13,871 WARN org.apache.hadoop.mapred.Child: Error >>>>> running child >>>>> java.lang.RuntimeException: Error in configuring object >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) >>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:266) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>>> at >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:260) >>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>>> ... 9 more >>>>> Caused by: java.lang.RuntimeException: Error in configuring object >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >>>>> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) >>>>> ... 14 more >>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>>>> ... 17 more >>>>> Caused by: java.lang.RuntimeException: configuration exception >>>>> at >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230) >>>>> at >>>>> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) >>>>> ... 22 more >>>>> Caused by: java.io.IOException: Cannot run program "mapper.py": >>>>> java.io.IOException: error=2, No such file or directory >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) >>>>> at >>>>> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) >>>>> ... 23 more >>>>> Caused by: java.io.IOException: java.io.IOException: error=2, No such >>>>> file or directory >>>>> at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) >>>>> at java.lang.ProcessImpl.start(ProcessImpl.java:65) >>>>> at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) >>>>> ... 24 more >>>>> 2013-05-24 18:21:13,879 INFO org.apache.hadoop.mapred.Task: Runnning >>>>> cleanup for the task >>>>> >>>> >>>> >>> >> >
