Hi Claudio, The patch worked !! :-) Just to be clear, I am running Giraph (1.0.0), not git cloned. and hadoop 2.0.0-cdh4.1.1 I applied your patch and rebuilt the giraph source code with this command, mvn -Phadoop_2.0.0 clean compile package test install verify This built correctly, with no exceptions and no tests failed. I then ran the giraph example, which ran successfully with this command [root@localhost giraph]# hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0- alpha-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/root/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/root/output/shortestpaths -w 1 I then deleted the output hadoop fs -rm -R /user/root/output/shortestpaths I then restarted my HBase daemons, and ran the giraph example again, and it worked successfully again,no errors, no exceptions, no tasks failed, and output produced correctly. Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on port 22181. Thank you very much for your help :-) Ken
From: claudio.marte...@gmail.com Date: Wed, 4 Sep 2013 19:21:37 +0200 Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. To: user@giraph.apache.org Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing zookeeper is not used through the giraph.zkServerList parameter, with its own configuration listening on port 22181. On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams <zoo9...@hotmail.com> wrote: Hmmmmmmmm. Interesting. Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ? The only version of ZooKeeper I have installed is the one that came with HBase, and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies clientPort=2181This is the only zoo.cfg file on my machine. [root@localhost]# cat /etc/zookeeper/conf/zoo.cfg ....maxClientCnxns=50# The number of milliseconds of each tick tickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.dataDir=/var/lib/zookeeper# the port at which the clients will connect clientPort=2181server.1=localhost:2888:3888[root@localhost Downloads]# From: claudio.marte...@gmail.com Date: Wed, 4 Sep 2013 12:13:50 +0200 Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. To: user@giraph.apache.org That should in principle not be the case, as the zookeeper started by Giraph listens on a different port than the default. See parameter giraph.zkServerPort, which defaults to 22181. On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams <zoo9...@hotmail.com> wrote: Hi Claudio, I think I have fixed the problem. HBase runs with its own copy of ZooKeeper which listens on port 2181. So, when I tried to start ZooKeeper for Giraph it also tried to listen on port 2181 and found it was already in use, and then it terminated - which is why Giraph failed. If I stop the HBase daemons (including its copy of ZooKeeper) then Giraph runs fine. Essentially there is a conflict between running ZooKeeper for Giraph, if there is already ZooKeeper running for HBase. I will try the patch and get back to you. Thanks for all your help, Ken From: claudio.marte...@gmail.com Date: Tue, 3 Sep 2013 17:01:01 +0200 Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. To: user@giraph.apache.org try with the attached patch applied to trunk, without the mentioned -D giraph.zkManagerDirectory. On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams <zoo9...@hotmail.com> wrote: Hi Claudio, I tried this but it made no difference. The map tasks still fail, still no output, and still anexception in the log files - FileNotFoundException: File /tmp/giraph/_zkServer does not exist. [root@localhost giraph]# hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.zkManagerDirectory='/tmp/giraph/' org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/root/input/tiny_graph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/root/output/shortestpaths -w 1 13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format edge value type is not known 13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_003913/09/03 14:20:02 INFO mapred.JobClient: map 0% reduce 0%13/09/03 14:20:12 INFO mapred.JobClient: Job complete: job_201308291126_0039 13/09/03 14:20:12 INFO mapred.JobClient: Counters: 613/09/03 14:20:12 INFO mapred.JobClient: Job Counters 13/09/03 14:20:12 INFO mapred.JobClient: Failed map tasks=113/09/03 14:20:12 INFO mapred.JobClient: Launched map tasks=2 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=1632713/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=013/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 [root@localhost giraph]# When I try to run Zookeeper it still gives me an 'Address already in use' exception. [root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh start-foreground JMX enabled by defaultUsing config: /usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,882 [myid:] - INFO [main:QuorumPeerConfig@101] - Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg 2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid configuration, only one server specified (ignoring)2013-09-03 14:23:37,889 [myid:] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3 2013-09-03 14:23:37,889 [myid:] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 02013-09-03 14:23:37,890 [myid:] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled. 2013-09-03 14:23:37,890 [myid:] - WARN [main:QuorumPeerMain@118] - Either no config or no quorum defined in config, running in standalone mode2013-09-03 14:23:37,904 [myid:] - INFO [main:QuorumPeerConfig@101] - Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg 2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid configuration, only one server specified (ignoring)2013-09-03 14:23:37,905 [myid:] - INFO [main:ZooKeeperServerMain@100] - Starting server 2013-09-03 14:23:37,920 [myid:] - INFO [main:Environment@100] - Server environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 GMT2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server environment:host.name=localhost.localdomain 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server environment:java.version=1.6.0_312013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server environment:java.vendor=Sun Microsystems Inc. 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server environment:java.home=/usr/java/jdk1.6.0_31/jre2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/usr/lib/zookeeper/bin/../conf: 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/jdk1.6.0_31/jre/lib/i386/client:/usr/java/jdk1.6.0_31/jre/lib/i386:/usr/java/jdk1.6.0_31/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server environment:java.compiler=<NA> 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server environment:os.name=Linux2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server environment:os.arch=i386 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server environment:os.version=2.6.32-279.14.1.el6.i6862013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server environment:user.name=root 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server environment:user.home=/root2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server environment:user.dir=/usr/local/giraph-1.0.0 2013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@726] - tickTime set to 20002013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@735] - minSessionTimeout set to -12013-09-03 14:23:37,935 [myid:] - INFO [main:ZooKeeperServer@744] - maxSessionTimeout set to -1 2013-09-03 14:23:37,970 [myid:] - INFO [main:NIOServerCnxnFactory@99] - binding to port 0.0.0.0/0.0.0.0:21812013-09-03 14:23:37,972 [myid:] - ERROR [main:ZooKeeperServerMain@68] - Unexpected exception, exiting abnormally java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79) [root@localhost giraph]# Thank you for any help, Ken From: claudio.marte...@gmail.com Date: Tue, 3 Sep 2013 12:43:59 +0200 Subject: Re: FileNotFoundException: File _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. To: user@giraph.apache.org can you try defining the zookeeper manager directory from the command line? like this -D giraph.zkManagerDirectory=/path/in/hdfs/foobar you'll have to delete this directory by hand before each job. Just to see if it solves the problem. Then I could know how to fix it. On Tue, Sep 3, 2013 at 12:32 PM, Ken Williams <zoo9...@hotmail.com> wrote: Hi Pradeep, Yes, the zookeeper server is definitely running, I can connect to it with the command-line client [root@localhost giraph]# zkCli.sh -server 127.0.0.1:2181 Connecting to 127.0.0.1:21812013-09-03 11:15:45,987 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 GMT 2013-09-03 11:15:45,990 [myid:] - INFO [main:Environment@100] - Client environment:host.name=localhost.localdomain2013-09-03 11:15:45,990 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.6.0_31 ......WatchedEvent state:SyncConnected type:None path:null[zk: 127.0.0.1:2181(CONNECTED) 0] ls /[hbase, zookeeper][zk: 127.0.0.1:2181(CONNECTED) 1] However, I am a bit confused. If I look in the zookeeper log-file I see this port 2181 'Address already in use' error, 2013-09-03 10:52:24,412 [myid:] - INFO [main:ZooKeeperServer@735] - minSessionTimeout set to -1 2013-09-03 10:52:24,413 [myid:] - INFO [main:ZooKeeperServer@744] - maxSessionTimeout set to -12013-09-03 10:52:24,436 [myid:] - INFO [main:NIOServerCnxnFactory@99] - binding to port 0.0.0.0/0.0.0.0:2181 2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] - Unexpected exception, exiting abnormallyjava.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91) The process listening on port 2181 is 2892, which turns out to be HBase. [root@localhost giraph]# fuser 2181/tcp2181/tcp: 2892[root@localhost giraph]# ps aux | grep 2892 hbase 2892 0.1 3.2 719592 119624 ? Sl Aug29 7:35 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx500m -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log -Dhbase.home.dir=/usr/lib/hbase/bin/.. ...... So I am not sure what my zookeeper client is connecting to. It seems to be connecting to a zookeeper server but when I do 'ps' I cannot see a zookeeper server running. Here is my zoo.cfg file, maxClientCnxns=50# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial synchronization phase can take initLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored. dataDir=/var/lib/zookeeper# the port at which the clients will connectclientPort=2181server.1=localhost:2888:3888 Thanks for any help, Ken -- Claudio Martella claudio.marte...@gmail.com -- Claudio Martella claudio.marte...@gmail.com -- Claudio Martella claudio.marte...@gmail.com -- Claudio Martella claudio.marte...@gmail.com