> It seems Streaming could not find the Python files, since it searched them in > the local file system.
It works if I specify references to the local files. However, if I set hdfs://localhost/ as a file system, I keep getting the connection error. May the port number matter? Roman On Fri, Sep 27, 2013 at 6:55 AM, Roman Shapovalov <[email protected]> wrote: > Martin, > >> then you don't have started hdfs? > > I have not started it manually, but it has been active: > > NameNode '0.0.0.0:8020' (active) > Started:Wed Sep 25 18:54:42 EDT 2013 > >> Your hdfs should contain the following files: > > It does. > >> Without the default file system in hama-site.xml, it will not work. > > Well, at least Hama (without streaming) worked, using the local file system. > It seems Streaming could not find the Python files, since it searched > them in the local file system. > > Roman > > On Fri, Sep 27, 2013 at 6:30 AM, Martin Illecker <[email protected]> wrote: >> Hi Roman, >> >> then you don't have started hdfs? (start-dfs.sh) >> >> Are you able to access the hdfs namenode? >> http://localhost:50070/dfshealth.jsp >> >> Your hdfs should contain the following files: >> >> $hadoop fs -ls /tmp/PyStreaming/ >> Found 8 items >> -rw-r--r-- 279 2013-09-27 12:19 /tmp/PyStreaming/BSP.py >> -rw-r--r-- 5159 2013-09-27 12:19 /tmp/PyStreaming/BSPPeer.py >> -rw-r--r-- 379 2013-09-27 12:19 /tmp/PyStreaming/BSPRunner.py >> -rw-r--r-- 970 2013-09-27 12:19 /tmp/PyStreaming/BinaryProtocol.py >> -rw-r--r-- 299 2013-09-27 12:19 /tmp/PyStreaming/BspJobConfiguration.py >> -rw-r--r-- 557 2013-09-27 12:19 /tmp/PyStreaming/HelloWorldBSP.py >> -rw-r--r-- 5570 2013-09-27 12:19 /tmp/PyStreaming/KMeansBSP.py >> -rw-r--r-- 326 2013-09-27 12:19 /tmp/PyStreaming/README >> >> Without the default file system in hama-site.xml, it will not work. >> >> Martin >> >> >> 2013/9/27 Roman Shapovalov <[email protected]> >> >>> Martin, >>> >>> if I set default file system to hdfs://localhost/, I get the connection >>> error: >>> >>> 13/09/27 14:04:11 INFO ipc.Client: Retrying connect to server: >>> localhost/127.0.0.1:40000. Already tried 0 time(s); retry policy is >>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 >>> SECONDS) >>> >>> (and 10 times like that, than get a java.net.ConnectException). >>> >>> I attach the hama-site.xml (as it was before adding the default fs >>> property). I had only added the bsp.master.address property to switch >>> to the PDM. >>> >>> Roman >>> >>> On Fri, Sep 27, 2013 at 4:20 AM, Martin Illecker <[email protected]> >>> wrote: >>> > Hi Roman! >>> > >>> > Did you setup the default filesystem in hama-site.xml? >>> > >>> > Please submit your hama-site.xml configuration. >>> > >>> > Martin >>> > >>> > >>> > hama-site.xml - pseudo-distributed mode >>> > >>> > <configuration> >>> > >>> > <property> >>> > <name>bsp.master.address</name> >>> > <value>localhost:40000</value> >>> > <description>The address of the bsp master server. Either the >>> > literal string "local" or a host:port for distributed mode >>> > </description> >>> > </property> >>> > >>> > <property> >>> > <name>fs.default.name</name> >>> > <value>hdfs://localhost/</value> >>> > <description> >>> > The name of the default file system. Either the literal >>> string >>> > "local" or a host:port for HDFS. >>> > </description> >>> > </property> >>> > >>> > <property> >>> > <name>hama.zookeeper.quorum</name> >>> > <value>localhost</value> >>> > <description>Comma separated list of servers in the ZooKeeper >>> Quorum. >>> > For example, "host1.mydomain.com,host2.mydomain.com, >>> host3.mydomain.com". >>> > By default this is set to localhost for local and >>> pseudo-distributed modes >>> > of operation. For a fully-distributed setup, this should be >>> set to a full >>> > list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is set >>> in hama-env.sh >>> > this is the list of servers which we will start/stop >>> zookeeper on. >>> > </description> >>> > </property> >>> > >>> > </configuration> >>> > >>> > >>> > Am 27.09.2013 um 09:32 schrieb Roman Shapovalov < >>> [email protected]>: >>> > >>> >> Edward, >>> >> >>> >> Yes, I did. See the logs in my previous message. >>> >> >>> >> Roman >>> >> >>> >> On Fri, Sep 27, 2013 at 7:15 AM, Edward J. Yoon <[email protected]> >>> wrote: >>> >>> Have you tried to run in pseudo-distributed mode? >>> >>> >>> >>> On Fri, Sep 27, 2013 at 5:47 AM, Roman Shapovalov >>> >>> <[email protected]> wrote: >>> >>>> Martin, >>> >>>> >>> >>>> Thanks for such verbose instructions. >>> >>>> >>> >>>>> You can find all Hama configuration files in the *conf* folder. >>> >>>> >>> >>>> OK, I thought Edward meant Hadoop configs specifically. >>> >>>> I have only added JAVA_HOME variable there, otherwise they are >>> default. >>> >>>> >>> >>>>> You should also find task logs in your *temp* folder. >>> >>>> >>> >>>> I found the folder, but there were no .log files in the attempt* >>> >>>> folders (in both modes). >>> >>>> >>> >>>>> Normally you should find it in *hama/logs/tasklogs*. >>> >>>> >>> >>>> They appear in the pseudo-distributed mode only (which also fails). >>> >>>> See the attached file. >>> >>>> >>> >>>>> By the way do you have python3.2 installed? :-) >>> >>>> >>> >>>> Yes. "python" links to Python 2.6, but I pass "python3.2" as an >>> >>>> interpreter, which links to the correct version. >>> >>>> >>> >>>> >>> >>>> Roman >>> >>>> >>> >>>> On Thu, Sep 26, 2013 at 4:03 PM, Martin Illecker < >>> [email protected]> wrote: >>> >>>>> Hi Roman, >>> >>>>> >>> >>>>> if you are running Hama in local mode, it will not use HDFS anyway. >>> >>>>> >>> >>>>> You can find all Hama configuration files in the *conf* folder. >>> >>>>> >>> >>>>> $ll hama/conf/ >>> >>>>> total 56 >>> >>>>> -rwxr-xr-x groomservers* >>> >>>>> -rwxr-xr-x hama-default.xml* >>> >>>>> -rwxr-xr-x hama-env.sh* >>> >>>>> -rwxr-xr-x hama-site.xml* >>> >>>>> -rwxr-xr-x log4j.properties* >>> >>>>> >>> >>>>> Probably you should setup the Pseudo Distributed Mode [1] in >>> hama-site.xml. >>> >>>>> >>> >>>>> But the task log would be very interesting. >>> >>>>> >>> >>>>> Normally you should find it in *hama/logs/tasklogs*. >>> >>>>> e.g., >>> hama/logs/tasklogs/job_201309262134_0001/attempt_201309262134_0001_000000_0.log >>> >>>>> >>> >>>>> You should also find task logs in your *temp* folder. >>> >>>>> But this location will depend on your operation system. >>> >>>>> e.g., in OSX >>> >>>>> >>> /private/tmp/hadoop-YOURUSER/bsp/local/groomServer/attempt_201309262134_0001_000000_0/work/tasklogs/ >>> >>>>> >>> >>>>> By the way do you have python3.2 installed? :-) >>> >>>>> $ python --version >>> >>>>> Python 3.2.5 >>> >>>>> $ python3.2 --version >>> >>>>> Python 3.2.5 >>> >>>>> >>> >>>>> May I ask which operation system do you use? >>> >>>>> >>> >>>>> Martin >>> >>>>> >>> >>>>> [1] >>> http://wiki.apache.org/hama/GettingStarted#Pseudo_Distributed_Mode >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> 2013/9/26 Roman Shapovalov <[email protected]> >>> >>>>> >>> >>>>>> Hi Edward, >>> >>>>>> >>> >>>>>> Could you please be more specific? (Sorry, I am new to this stuff) >>> >>>>>> >>> >>>>>> I run Hama in local mode. The logs/ directory is empty, and I did >>> not >>> >>>>>> find any logs in HDFS as well. >>> >>>>>> >>> >>>>>> And where can I find the Hadoop configuration? >>> >>>>>> >>> >>>>>> Thank you, >>> >>>>>> Roman >>> >>>>>> >>> >>>>>> On Thu, Sep 26, 2013 at 12:05 PM, Edward J. Yoon < >>> [email protected]> >>> >>>>>> wrote: >>> >>>>>>> Hi, >>> >>>>>>> >>> >>>>>>> That's strange. Can you attach your namenode logs and hadoop >>> >>>>>> configurations? >>> >>>>>>> >>> >>>>>>> On Thu, Sep 26, 2013 at 11:03 PM, Roman Shapovalov >>> >>>>>>> <[email protected]> wrote: >>> >>>>>>>> Hi again, >>> >>>>>>>> >>> >>>>>>>> I have updated both Hama (from the trunk) and Streaming (from >>> Martin's >>> >>>>>>>> github), and checked that patches have been applied, but I keep >>> >>>>>>>> getting the same error (full log for local configuration is >>> attached). >>> >>>>>>>> >>> >>>>>>>> Another thing may be relevant: I keep the default Hadoop >>> libraries in >>> >>>>>>>> lib/. If I replace them as the tutorial says, some classes cannot >>> be >>> >>>>>>>> found even if I run pure Hama (which works perfectly with default >>> >>>>>>>> libs). I don't know if it is important. >>> >>>>>>>> >>> >>>>>>>> Thanks, >>> >>>>>>>> Roman >>> >>>>>>>> >>> >>>>>>>> On Tue, Sep 24, 2013 at 9:22 AM, Martin Illecker < >>> [email protected]> >>> >>>>>> wrote: >>> >>>>>>>>> Hi Roman, >>> >>>>>>>>> >>> >>>>>>>>> sorry for inconvenience! >>> >>>>>>>>> The problem has been reported [1] and will be fixed shortly to >>> the >>> >>>>>> trunk. >>> >>>>>>>>> >>> >>>>>>>>> [1] https://issues.apache.org/jira/browse/HAMA-805 >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> 2013/9/23 Edward J. Yoon <[email protected]> >>> >>>>>>>>> >>> >>>>>>>>>> This looks like a bug of DistCacheUtils. >>> >>>>>>>>>> >>> >>>>>>>>>> Thanks for your report. I'll look at it tomorrow. >>> >>>>>>>>>> >>> >>>>>>>>>> On Mon, Sep 23, 2013 at 11:52 PM, Roman Shapovalov >>> >>>>>>>>>> <[email protected]> wrote: >>> >>>>>>>>>>> Hello all, >>> >>>>>>>>>>> >>> >>>>>>>>>>> I try to use Hama Streaming. >>> >>>>>>>>>>> I have successfully installed Hama (the Pi example works). >>> >>>>>>>>>>> I follow this tutorial: >>> >>>>>>>>>>> http://wiki.apache.org/hama/HamaStreaming >>> >>>>>>>>>>> >>> >>>>>>>>>>> When I try to run the distributed HelloWorld in the local >>> >>>>>>>>>>> configuration, I get the following error: >>> >>>>>>>>>>> >>> >>>>>>>>>>> $ bin/hama pipes -streaming true -bspTasks 3 -interpreter >>> python3.2 >>> >>>>>>>>>>> -cachefiles /tmp/PyStreaming/*.py -output /tmp/pystream-out/ >>> >>>>>> -program >>> >>>>>>>>>>> /tmp/PyStreaming/BSPRunner.py -programArgs HelloWorldBSP >>> >>>>>>>>>>> >>> >>>>>>>>>>> 13/09/23 18:03:50 INFO pipes.Submitter: Streaming enabled! >>> >>>>>>>>>>> 13/09/23 18:03:50 WARN util.NativeCodeLoader: Unable to load >>> >>>>>>>>>>> native-hadoop library for your platform... using builtin-java >>> >>>>>> classes >>> >>>>>>>>>>> where applicable >>> >>>>>>>>>>> 13/09/23 18:03:50 WARN bsp.BSPJobClient: No job jar file set. >>> User >>> >>>>>>>>>>> classes may not be found. See BSPJob#setJar(String) or check >>> Your >>> >>>>>> jar >>> >>>>>>>>>>> file. >>> >>>>>>>>>>> 13/09/23 18:03:50 INFO bsp.BSPJobClient: Running job: >>> >>>>>>>>>> job_localrunner_0001 >>> >>>>>>>>>>> 13/09/23 18:03:50 INFO bsp.LocalBSPRunner: Setting up a new >>> barrier >>> >>>>>> for >>> >>>>>>>>>> 3 tasks! >>> >>>>>>>>>>> 13/09/23 18:03:50 ERROR bsp.LocalBSPRunner: Exception during >>> BSP >>> >>>>>>>>>> execution! >>> >>>>>>>>>>> java.lang.NullPointerException >>> >>>>>>>>>>> at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:44) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:255) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211) >>> >>>>>>>>>>> at >>> >>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> >>>>>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >>> >>>>>>>>>>> at >>> >>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>> >>>>>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>> >>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>> >>>>>>>>>>> [output cropped] >>> >>>>>>>>>>> >>> >>>>>>>>>>> When I turn to the pseudo-distributed mode, job fails too >>> (after a >>> >>>>>>>>>>> minute of execution): >>> >>>>>>>>>>> >>> >>>>>>>>>>> 13/09/23 18:46:34 INFO pipes.Submitter: Streaming enabled! >>> >>>>>>>>>>> 13/09/23 18:46:34 WARN util.NativeCodeLoader: Unable to load >>> >>>>>>>>>>> native-hadoop library for your platform... using builtin-java >>> >>>>>> classes >>> >>>>>>>>>>> where applicable >>> >>>>>>>>>>> 13/09/23 18:46:34 WARN bsp.BSPJobClient: No job jar file set. >>> User >>> >>>>>>>>>>> classes may not be found. See BSPJob#setJar(String) or check >>> Your >>> >>>>>> jar >>> >>>>>>>>>>> file. >>> >>>>>>>>>>> 13/09/23 18:46:34 INFO bsp.BSPJobClient: Running job: >>> >>>>>>>>>> job_201309231846_0001 >>> >>>>>>>>>>> 13/09/23 18:47:40 INFO bsp.BSPJobClient: Job failed. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Task log contains errors: >>> >>>>>>>>>>> >>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server: Starting Socket Reader #1 >>> for >>> >>>>>> port >>> >>>>>>>>>> 43475 >>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server: IPC Server Responder: >>> starting >>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server: IPC Server listener on >>> 43475: >>> >>>>>> starting >>> >>>>>>>>>>> 13/09/23 18:46:37 INFO message.HadoopMessageManagerImpl: >>> BSPPeer >>> >>>>>>>>>>> address:localhost.localdomain port:43475 >>> >>>>>>>>>>> 13/09/23 18:46:37 INFO ipc.Server: IPC Server handler 0 on >>> 43475: >>> >>>>>>>>>> starting >>> >>>>>>>>>>> 13/09/23 18:46:37 WARN util.NativeCodeLoader: Unable to load >>> >>>>>>>>>>> native-hadoop library for your platform... using builtin-java >>> >>>>>> classes >>> >>>>>>>>>>> where applicable >>> >>>>>>>>>>> 13/09/23 18:46:37 INFO sync.ZKSyncClient: Initializing ZK Sync >>> >>>>>> Client >>> >>>>>>>>>>> 13/09/23 18:46:37 INFO sync.ZooKeeperSyncClientImpl: Start >>> >>>>>> connecting >>> >>>>>>>>>>> to Zookeeper! At localhost.localdomain/127.0.0.1:43475 >>> >>>>>>>>>>> 13/09/23 18:46:37 ERROR bsp.BSPTask: Error running bsp setup >>> and bsp >>> >>>>>>>>>> function. >>> >>>>>>>>>>> java.lang.NullPointerException >>> >>>>>>>>>>> at java.io.File.<init>(File.java:222) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> org.apache.hama.pipes.PipesApplication.setupCommand(PipesApplication.java:130) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:257) >>> >>>>>>>>>>> at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:44) >>> >>>>>>>>>>> at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:176) >>> >>>>>>>>>>> at org.apache.hama.bsp.BSPTask.run(BSPTask.java:146) >>> >>>>>>>>>>> at >>> >>>>>>>>>> >>> >>>>>> >>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1246) >>> >>>>>>>>>>> [output cropped] >>> >>>>>>>>>>> >>> >>>>>>>>>>> I use the latest trunk version of Hama, Python 3.2.5 and Hadoop >>> >>>>>>>>>> 2.0.0-cdh4.1.1. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Please help me to figure out the problem. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Thanks in advance, >>> >>>>>>>>>>> Roman >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> -- >>> >>>>>>>>>> Best Regards, Edward J. Yoon >>> >>>>>>>>>> @eddieyoon >>> >>>>>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -- >>> >>>>>>> Best Regards, Edward J. Yoon >>> >>>>>>> @eddieyoon >>> >>>>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Best Regards, Edward J. Yoon >>> >>> @eddieyoon >>> > >>>
