I tried bin/stop-bspd.sh but the output of script says that no groom/bspmaster process. Then I have to kill them manually. I am working on Hama 0.7.0
On Mon, Aug 3, 2015 at 1:07 AM, Edward J. Yoon <[email protected]> wrote: > Hi, > > Congratz! You can shutdown the cluster with following command: $ > bin/stop-bspd.sh > > -- > Best Regards, Edward J. Yoon > > -----Original Message----- > From: Behroz Sikander [mailto:[email protected]] > Sent: Sunday, August 02, 2015 11:27 PM > To: [email protected] > Subject: Re: Groomserer BSPPeerChild limit > > Hi, > Last day, I got the fix for /etc/hosts file and now I can modify it. I > tried to run the cluster with 3 machines and everything went super fine. > > Thanks :) > > btw if I run a process using the following. How can I stop it ? Right now I > am using kill -9 <process_id> > % ./bin/hama bspmaster > > On Mon, Jun 29, 2015 at 5:53 AM, Behroz Sikander <[email protected]> > wrote: > > > Ok perfect. I do not have rights on /etc/hosts so that's why I was using > > the IP addresses. I will talk to the administrator. > > > > Btw I am wondering, how PI example was able to communicate with the other > > servers. PI examples runs fine even if I have tasks more than 3 (works on > > both machines). > > > > On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <[email protected]> > > wrote: > > > >> OKay almost done. I guess you need to add host names to your > >> /etc/hosts file. :-) Please see also > >> > >> > http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster > >> > >> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <[email protected]> > >> wrote: > >> > Server 2 was showing the exception that I posted in the previous > email. > >> > Server1 is showing the following exception > >> > > >> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000: > >> starting > >> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is > >> added. > >> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer > >> > groomd_8d4b512cf448_50000 > >> > java.net.UnknownHostException: unknown host: 8d4b512cf448 > >> > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225) > >> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039) > >> > at org.apache.hama.ipc.Client.call(Client.java:888) > >> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239) > >> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source) > >> > > >> > I am looking into this issue. > >> > > >> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <[email protected]> > >> wrote: > >> > > >> >> Ok great. I was able to run the zk, groom and bspmaster on server 1. > >> But > >> >> when I ran the groom on server2 I got the following exception > >> >> > >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in > >> >> establishing communication link with BSPMaster > >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while > >> >> reinitializing GroomServer: java.io.IOException: There is a problem > in > >> >> establishing communication link with BSPMaster. > >> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426) > >> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860) > >> >> at java.lang.Thread.run(Thread.java:745) > >> >> > >> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon < > [email protected] > >> > > >> >> wrote: > >> >> > >> >>> Here's my configurations: > >> >>> > >> >>> hama-site.xml: > >> >>> > >> >>> <property> > >> >>> <name>bsp.master.address</name> > >> >>> <value>cluster-0:40000</value> > >> >>> </property> > >> >>> > >> >>> <property> > >> >>> <name>fs.default.name</name> > >> >>> <value>hdfs://cluster-0:9000/</value> > >> >>> </property> > >> >>> > >> >>> <property> > >> >>> <name>hama.zookeeper.quorum</name> > >> >>> <value>cluster-0</value> > >> >>> </property> > >> >>> > >> >>> > >> >>> % bin/hama zookeeper > >> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid > >> >>> configuration, only one server specified (ignoring) > >> >>> > >> >>> Then, open new terminal and run master with following command: > >> >>> > >> >>> % bin/hama bspmaster > >> >>> ... > >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK > >> false > >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync > Client > >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting > >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: > >> starting > >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: > >> starting > >> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING > >> >>> > >> >>> > >> >>> > >> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon < > >> [email protected]> > >> >>> wrote: > >> >>> > Hi, > >> >>> > > >> >>> > If you run zk server too, BSPmaster will be connected to zk and > >> won't > >> >>> > throw exceptions. > >> >>> > > >> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander < > >> [email protected]> > >> >>> wrote: > >> >>> >> Hi, > >> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have > >> the > >> >>> same > >> >>> >> problem. > >> >>> >> When I run % bin/hama bspmaster, I am getting the following > >> exception > >> >>> >> > >> >>> >> INFO http.HttpServer: Port returned by > >> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1. > >> >>> Opening > >> >>> >> the listener on 40013 > >> >>> >> INFO http.HttpServer: listener.getLocalPort() returned 40013 > >> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013 > >> >>> >> INFO http.HttpServer: Jetty bound to port 40013 > >> >>> >> INFO mortbay.log: jetty-6.1.14 > >> >>> >> INFO mortbay.log: Extract > >> >>> >> > >> >>> > >> > jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/ > >> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp > >> >>> >> INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc > >> :40013 > >> >>> >> INFO bsp.BSPMaster: Cleaning up the system directory > >> >>> >> INFO bsp.BSPMaster: hdfs:// > >> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system > >> >>> >> INFO sync.ZKSyncBSPMasterClient: Initialized ZK false > >> >>> >> INFO sync.ZKSyncClient: Initializing ZK Sync Client > >> >>> >> ERROR sync.ZKSyncBSPMasterClient: > >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException: > >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp > >> >>> >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > >> >>> >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > >> >>> >> at > >> >>> >> > >> >>> > >> > org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62) > >> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534) > >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517) > >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500) > >> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46) > >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > >> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56) > >> >>> >> ERROR sync.ZKSyncBSPMasterClient: > >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException: > >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp > >> >>> >> > >> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am > using > >> >>> just > >> >>> >> two servers 172.17.0.3 and 172.17.0.7)* > >> >>> >> <property> > >> >>> >> <name>hama.zookeeper.quorum</name> > >> >>> >> <value>172.17.0.3,172.17.0.7</value> > >> >>> >> <description>Comma separated list of servers in > >> the > >> >>> >> ZooKeeper quorum. > >> >>> >> For example, "host1.mydomain.com, > >> host2.mydomain.com, > >> >>> >> host3.mydomain.com". > >> >>> >> By default this is set to localhost for local > and > >> >>> >> pseudo-distributed modes > >> >>> >> of operation. For a fully-distributed setup, > this > >> >>> should > >> >>> >> be set to a full > >> >>> >> list of ZooKeeper quorum servers. If > >> HAMA_MANAGES_ZK > >> >>> is > >> >>> >> set in hama-env.sh > >> >>> >> this is the list of servers which we will > >> start/stop > >> >>> >> ZooKeeper on. > >> >>> >> </description> > >> >>> >> </property> > >> >>> >> ...... > >> >>> >> <property> > >> >>> >> <name>hama.zookeeper.property.clientPort</name> > >> >>> >> <value>2181</value> > >> >>> >> </property> > >> >>> >> > >> >>> >> Is something wrong with my settings ? > >> >>> >> > >> >>> >> Regards, > >> >>> >> Behroz Sikander > >> >>> >> > >> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon < > >> >>> [email protected]> > >> >>> >> wrote: > >> >>> >> > >> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra > >> >>> >>> configurations > >> >>> >>> > >> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. > >> Yarn > >> >>> >>> configuration is only needed when you want to submit a BSP job > to > >> Yarn > >> >>> >>> cluster > >> >>> >>> without Hama cluster. So you don't need to worry about it. :-) > >> >>> >>> > >> >>> >>> > distributed mode ? and is there any way to manage the server > ? I > >> >>> mean > >> >>> >>> right > >> >>> >>> > now, I have 3 machines with alot of configurations files and > log > >> >>> files. > >> >>> >>> It > >> >>> >>> > >> >>> >>> You can use web UI at > >> http://masterserver_address:40013/bspmaster.jsp > >> >>> >>> > >> >>> >>> To debug your program, please try like below: > >> >>> >>> > >> >>> >>> 1) Run a BSPMaster and Zookeeper at server1. > >> >>> >>> % bin/hama bspmaster > >> >>> >>> % bin/hama zookeeper > >> >>> >>> > >> >>> >>> 2) Run a Groom at server1 and server2. > >> >>> >>> > >> >>> >>> % bin/hama groom > >> >>> >>> > >> >>> >>> 3) Check whether deamons are running well. Then, run your > program > >> >>> using jar > >> >>> >>> command at server1. > >> >>> >>> > >> >>> >>> % bin/hama jar ..... > >> >>> >>> > >> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following > >> >>> exception. But > >> >>> >>> > this occurs in both cases when I run my job with 3 tasks or > >> with 4 > >> >>> tasks > >> >>> >>> > >> >>> >>> In fact, you should not see above initZK error log. > >> >>> >>> > >> >>> >>> -- > >> >>> >>> Best Regards, Edward J. Yoon > >> >>> >>> > >> >>> >>> > >> >>> >>> -----Original Message----- > >> >>> >>> From: Behroz Sikander [mailto:[email protected]] > >> >>> >>> Sent: Monday, June 29, 2015 8:18 AM > >> >>> >>> To: [email protected] > >> >>> >>> Subject: Re: Groomserer BSPPeerChild limit > >> >>> >>> > >> >>> >>> I will try the things that you mentioned. I am not using the > >> latest > >> >>> version > >> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra > >> >>> configurations > >> >>> >>> which makes it more harder for me to understand when things go > >> wrong. > >> >>> Any > >> >>> >>> suggestions ? > >> >>> >>> > >> >>> >>> Further, are there any tools that you use for debugging while in > >> >>> >>> distributed mode ? and is there any way to manage the server ? I > >> mean > >> >>> right > >> >>> >>> now, I have 3 machines with alot of configurations files and log > >> >>> files. It > >> >>> >>> takes alot of time. This makes me wonder how people who have > 100s > >> of > >> >>> >>> machines debug and manage the cluster. > >> >>> >>> > >> >>> >>> Regards, > >> >>> >>> Behroz > >> >>> >>> > >> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon < > >> >>> [email protected]> > >> >>> >>> wrote: > >> >>> >>> > >> >>> >>> > Hi, > >> >>> >>> > > >> >>> >>> > It looks like a zookeeper connection problem. Please check > >> whether > >> >>> >>> > zookeeper > >> >>> >>> > is running and every tasks can connect to zookeeper. > >> >>> >>> > > >> >>> >>> > I would recommend you to stop the firewall during debugging, > and > >> >>> please > >> >>> >>> use > >> >>> >>> > the 0.7.0 latest release. > >> >>> >>> > > >> >>> >>> > > >> >>> >>> > -- > >> >>> >>> > Best Regards, Edward J. Yoon > >> >>> >>> > > >> >>> >>> > -----Original Message----- > >> >>> >>> > From: Behroz Sikander [mailto:[email protected]] > >> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM > >> >>> >>> > To: [email protected] > >> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit > >> >>> >>> > > >> >>> >>> > To figure out the issue, I was trying something else and found > >> out > >> >>> >>> another > >> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both > >> >>> following > >> >>> >>> > lines give an exception. > >> >>> >>> > > >> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception > >> >>> >>> > > >> >>> >>> > System.out.println( peer.getNumPeers()); //Exception > >> >>> >>> > > >> >>> >>> > > >> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp > >> >>> function.* > >> >>> >>> > > >> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be > >> >>> >>> retrieved!* > >> >>> >>> > > >> >>> >>> > at > >> >>> >>> > > >> >>> >>> > > >> >>> >>> > >> >>> > >> > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305) > >> >>> >>> > > >> >>> >>> > at > >> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544) > >> >>> >>> > > >> >>> >>> > at > >> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538) > >> >>> >>> > > >> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)* > >> >>> >>> > > >> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > >> >>> >>> > > >> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > >> >>> >>> > > >> >>> >>> > at > >> >>> >>> > >> >>> > >> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > >> >>> >>> > > >> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander < > >> >>> [email protected]> > >> >>> >>> > wrote: > >> >>> >>> > > >> >>> >>> > > I think I have more information on the issue. I did some > >> >>> debugging and > >> >>> >>> > > found something quite strange. > >> >>> >>> > > > >> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 > >> and > >> >>> 3 task > >> >>> >>> > > will be opened on other MACHINE2), > >> >>> >>> > > > >> >>> >>> > > - 3 tasks on Machine1 are frozen and the strange thing is > >> that > >> >>> the > >> >>> >>> > > processes do not even enter the SETUP function of BSP > class. I > >> >>> have > >> >>> >>> print > >> >>> >>> > > statements in the setup function of BSP class and it doesn't > >> print > >> >>> >>> > > anything. I get empty files with zero size. > >> >>> >>> > > > >> >>> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:29 . > >> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 .. > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >> >>> >>> > > attempt_201506281624_0001_000000_0.err > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >> >>> >>> > > attempt_201506281624_0001_000000_0.log > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >> >>> >>> > > attempt_201506281624_0001_000001_0.err > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >> >>> >>> > > attempt_201506281624_0001_000001_0.log > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >> >>> >>> > > attempt_201506281624_0001_000002_0.err > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >> >>> >>> > > attempt_201506281624_0001_000002_0.log > >> >>> >>> > > > >> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP > >> class and > >> >>> >>> prints > >> >>> >>> > > stuff. See the size of files generated on output. How is it > >> >>> possible > >> >>> >>> that > >> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ? > >> >>> >>> > > > >> >>> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:39 . > >> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 .. > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 > >> >>> >>> > > attempt_201506281639_0001_000003_0.err > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 > >> >>> >>> > > attempt_201506281639_0001_000003_0.log > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 > >> >>> >>> > > attempt_201506281639_0001_000004_0.err > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 1368 Jun 28 16:39 > >> >>> >>> > > attempt_201506281639_0001_000004_0.log > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 > >> >>> >>> > > attempt_201506281639_0001_000005_0.err > >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 > >> >>> >>> > > attempt_201506281639_0001_000005_0.log > >> >>> >>> > > > >> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started. > >> >>> >>> > > > >> >>> >>> > > - Hama Groom log file on MACHINE2 shows > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*. > >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*. > >> >>> >>> > > > >> >>> >>> > > Any clue what might be going wrong ? > >> >>> >>> > > > >> >>> >>> > > Regards, > >> >>> >>> > > Behroz > >> >>> >>> > > > >> >>> >>> > > > >> >>> >>> > > > >> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander < > >> >>> [email protected]> > >> >>> >>> > > wrote: > >> >>> >>> > > > >> >>> >>> > >> Here is the log file from that folder > >> >>> >>> > >> > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader > #1 > >> for > >> >>> port > >> >>> >>> > >> 61001 > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: > >> starting > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on > >> 61001: > >> >>> >>> > starting > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on > >> 61001: > >> >>> >>> > starting > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on > >> 61001: > >> >>> >>> > starting > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on > >> 61001: > >> >>> >>> > starting > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on > >> 61001: > >> >>> >>> > starting > >> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: > >> BSPPeer > >> >>> >>> > >> address:b178b33b16cc port:61001 > >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on > >> 61001: > >> >>> >>> > starting > >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK > >> Sync > >> >>> Client > >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start > >> >>> connecting > >> >>> >>> to > >> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001 > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001 > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on > >> 61001: > >> >>> >>> > exiting > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server > >> listener > >> >>> on > >> >>> >>> 61001 > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on > >> 61001: > >> >>> >>> > exiting > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on > >> 61001: > >> >>> >>> > exiting > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server > >> Responder > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on > >> 61001: > >> >>> >>> > exiting > >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on > >> 61001: > >> >>> >>> > exiting > >> >>> >>> > >> > >> >>> >>> > >> > >> >>> >>> > >> And my console shows the following ouptut. Hama is frozen > >> right > >> >>> now. > >> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job: > >> >>> >>> > >> job_201506262331_0003 > >> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps > >> >>> number: 0 > >> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps > >> >>> number: 2 > >> >>> >>> > >> > >> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon < > >> >>> >>> [email protected]> > >> >>> >>> > >> wrote: > >> >>> >>> > >> > >> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs > >> folder. > >> >>> >>> > >>> > >> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander < > >> >>> [email protected] > >> >>> >>> > > >> >>> >>> > >>> wrote: > >> >>> >>> > >>> > Yea. I also thought that. I ran the program through > >> eclipse > >> >>> with 20 > >> >>> >>> > >>> tasks > >> >>> >>> > >>> > and it works fine. > >> >>> >>> > >>> > > >> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon < > >> >>> >>> > [email protected] > >> >>> >>> > >>> > > >> >>> >>> > >>> > wrote: > >> >>> >>> > >>> > > >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs > >> fine. > >> >>> When I > >> >>> >>> > >>> run my > >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when > I > >> >>> increase > >> >>> >>> > the > >> >>> >>> > >>> tasks > >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do > not > >> >>> >>> understand > >> >>> >>> > >>> what > >> >>> >>> > >>> >> can > >> >>> >>> > >>> >> > go wrong. > >> >>> >>> > >>> >> > >> >>> >>> > >>> >> It looks like a program bug. Have you ran your program > in > >> >>> local > >> >>> >>> > mode? > >> >>> >>> > >>> >> > >> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander < > >> >>> >>> > [email protected]> > >> >>> >>> > >>> >> wrote: > >> >>> >>> > >>> >> > Hi, > >> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 > >> and 3 > >> >>> are > >> >>> >>> > >>> resolved > >> >>> >>> > >>> >> but > >> >>> >>> > >>> >> > issue number 2 is still giving me headaches. > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> > My problem: > >> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of > them > >> >>> properly > >> >>> >>> > >>> >> configured > >> >>> >>> > >>> >> > (Apparently). From my master machine when I start > >> Hadoop > >> >>> and > >> >>> >>> Hama, > >> >>> >>> > >>> I can > >> >>> >>> > >>> >> > see the processes started on other 2 machines. If I > >> check > >> >>> the > >> >>> >>> > >>> maximum > >> >>> >>> > >>> >> tasks > >> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on > >> each > >> >>> >>> > machine). > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs > >> fine. > >> >>> When I > >> >>> >>> > >>> run my > >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when > I > >> >>> increase > >> >>> >>> > the > >> >>> >>> > >>> tasks > >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do > not > >> >>> >>> understand > >> >>> >>> > >>> what > >> >>> >>> > >>> >> can > >> >>> >>> > >>> >> > go wrong. > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> > I checked the logs files and things look fine. I just > >> >>> sometimes > >> >>> >>> > get > >> >>> >>> > >>> an > >> >>> >>> > >>> >> > exception that hama was not able to delete the sytem > >> >>> directory > >> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml. > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> > Any help or clue would be great. > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> > Regards, > >> >>> >>> > >>> >> > Behroz Sikander > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander < > >> >>> >>> > >>> [email protected]> > >> >>> >>> > >>> >> wrote: > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> >> Thank you :) > >> >>> >>> > >>> >> >> > >> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon < > >> >>> >>> > >>> [email protected] > >> >>> >>> > >>> >> > > >> >>> >>> > >>> >> >> wrote: > >> >>> >>> > >>> >> >> > >> >>> >>> > >>> >> >>> Hi, > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> You can get the maximum number of available tasks > >> like > >> >>> >>> following > >> >>> >>> > >>> code: > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> BSPJobClient jobClient = new > BSPJobClient(conf); > >> >>> >>> > >>> >> >>> ClusterStatus cluster = > >> >>> jobClient.getClusterStatus(true); > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> // Set to maximum > >> >>> >>> > >>> >> >>> bsp.setNumBspTask(cluster.getMaxTasks()); > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander < > >> >>> >>> > >>> [email protected]> > >> >>> >>> > >>> >> >>> wrote: > >> >>> >>> > >>> >> >>> > Hi, > >> >>> >>> > >>> >> >>> > 1) Thank you for this. > >> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log > >> files > >> >>> of PI > >> >>> >>> > >>> example > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > *Result of JPS command on slave* > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> > >> >>> >>> > >>> > >> >>> >>> > > >> >>> >>> > >> >>> > >> > http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > *Result of JPS command on Master* > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> > >> >>> >>> > >>> > >> >>> >>> > > >> >>> >>> > >> >>> > >> > http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input > >> >>> submitted to > >> >>> >>> > the > >> >>> >>> > >>> job. > >> >>> >>> > >>> >> >>> During > >> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I > am > >> >>> looking > >> >>> >>> > for > >> >>> >>> > >>> >> >>> something > >> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask(). > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > Regards, > >> >>> >>> > >>> >> >>> > Behroz > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon > < > >> >>> >>> > >>> >> [email protected] > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> > wrote: > >> >>> >>> > >>> >> >>> > > >> >>> >>> > >>> >> >>> >> Hello, > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a > >> configuration > >> >>> >>> using > >> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of > >> course, > >> >>> the > >> >>> >>> > >>> fs.defaultFS > >> >>> >>> > >>> >> >>> >> property should be in hama-site.xml > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> <property> > >> >>> >>> > >>> >> >>> >> <name>fs.defaultFS</name> > >> >>> >>> > >>> >> >>> >> <value>hdfs://host1.mydomain.com:9000/ > >> </value> > >> >>> >>> > >>> >> >>> >> <description> > >> >>> >>> > >>> >> >>> >> The name of the default file system. > Either > >> the > >> >>> >>> literal > >> >>> >>> > >>> string > >> >>> >>> > >>> >> >>> >> "local" or a host:port for HDFS. > >> >>> >>> > >>> >> >>> >> </description> > >> >>> >>> > >>> >> >>> >> </property> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of > tasks > >> per > >> >>> node. > >> >>> >>> > It > >> >>> >>> > >>> looks > >> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi > example > >> >>> and look > >> >>> >>> > at > >> >>> >>> > >>> the > >> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach > >> the > >> >>> images > >> >>> >>> to > >> >>> >>> > >>> >> mailing > >> >>> >>> > >>> >> >>> >> list so I can't see it. > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) > >> method. > >> >>> If > >> >>> >>> input > >> >>> >>> > >>> is > >> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically > >> driven > >> >>> by > >> >>> >>> the > >> >>> >>> > >>> number > >> >>> >>> > >>> >> of > >> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on > >> >>> HAMA-956. > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> Thanks! > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz > Sikander < > >> >>> >>> > >>> >> [email protected]> > >> >>> >>> > >>> >> >>> >> wrote: > >> >>> >>> > >>> >> >>> >> > Hi, > >> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup > >> to a 2 > >> >>> >>> > machine > >> >>> >>> > >>> >> setup. > >> >>> >>> > >>> >> >>> I was > >> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the > >> HDFS > >> >>> to get > >> >>> >>> > >>> data. I > >> >>> >>> > >>> >> >>> have 3 > >> >>> >>> > >>> >> >>> >> > trivial questions > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the > >> IP > >> >>> address > >> >>> >>> > of > >> >>> >>> > >>> >> server > >> >>> >>> > >>> >> >>> >> running > >> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically > >> pick > >> >>> from > >> >>> >>> the > >> >>> >>> > >>> >> >>> configurations > >> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something > >> >>> wrong. > >> >>> >>> Right > >> >>> >>> > >>> now my > >> >>> >>> > >>> >> >>> code > >> >>> >>> > >>> >> >>> >> work > >> >>> >>> > >>> >> >>> >> > by using the following. > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new > >> >>> >>> > >>> URI("hdfs://server_ip:port/"), > >> >>> >>> > >>> >> >>> conf); > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it > >> >>> >>> automatically > >> >>> >>> > >>> starts > >> >>> >>> > >>> >> >>> hama in > >> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and > >> slave > >> >>> are > >> >>> >>> set > >> >>> >>> > >>> as > >> >>> >>> > >>> >> >>> >> groomservers. > >> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job > >> which > >> >>> >>> means > >> >>> >>> > >>> that I > >> >>> >>> > >>> >> can > >> >>> >>> > >>> >> >>> >> open > >> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit > my > >> jar > >> >>> with > >> >>> >>> 3 > >> >>> >>> > >>> bsp > >> >>> >>> > >>> >> tasks > >> >>> >>> > >>> >> >>> then > >> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4 > >> tasks, > >> >>> Hama > >> >>> >>> > >>> freezes. > >> >>> >>> > >>> >> >>> Here is > >> >>> >>> > >>> >> >>> >> the > >> >>> >>> > >>> >> >>> >> > result of JPS command on slave. > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > Result of JPS command on Master > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on > >> slaves > >> >>> but > >> >>> >>> not > >> >>> >>> > >>> on > >> >>> >>> > >>> >> >>> master. > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum > >> >>> property in > >> >>> >>> > >>> >> >>> >> hama-default.xml > >> >>> >>> > >>> >> >>> >> > to 4 but still same result. > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many > >> BSPPeerChild > >> >>> >>> processes > >> >>> >>> > >>> as > >> >>> >>> > >>> >> >>> possible. > >> >>> >>> > >>> >> >>> >> Is > >> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve > that > >> ? > >> >>> Or hama > >> >>> >>> > >>> picks up > >> >>> >>> > >>> >> >>> the > >> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ? > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > Regards, > >> >>> >>> > >>> >> >>> >> > > >> >>> >>> > >>> >> >>> >> > Behroz Sikander > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> -- > >> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> -- > >> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >> > >> >>> >>> > >>> >> >> > >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > >> >>> >>> > >>> >> -- > >> >>> >>> > >>> >> Best Regards, Edward J. Yoon > >> >>> >>> > >>> >> > >> >>> >>> > >>> > >> >>> >>> > >>> > >> >>> >>> > >>> > >> >>> >>> > >>> -- > >> >>> >>> > >>> Best Regards, Edward J. Yoon > >> >>> >>> > >>> > >> >>> >>> > >> > >> >>> >>> > >> > >> >>> >>> > > > >> >>> >>> > > >> >>> >>> > > >> >>> >>> > > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> > > >> >>> > > >> >>> > > >> >>> > -- > >> >>> > Best Regards, Edward J. Yoon > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Best Regards, Edward J. Yoon > >> >>> > >> >> > >> >> > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> > > > > > > >
