great, i need to get a review soon to get the patch in the codebase.
On Thu, Sep 5, 2013 at 2:10 PM, Ken Williams <zoo9...@hotmail.com> wrote: > Hi Claudio, > > The patch worked !! :-) > > Just to be clear, > I am running Giraph (1.0.0), not git cloned. > and hadoop 2.0.0-cdh4.1.1 > > I applied your patch and rebuilt the giraph source code with > this command, > mvn -Phadoop_2.0.0 clean compile package test > install verify > > This built correctly, with no exceptions and no tests failed. > > I then ran the giraph example, which ran successfully with this command > > [root@localhost giraph]# hadoop jar > /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0- > alpha-jar-with-dependencies.jar org.apache.giraph.GiraphRunner > org.apache.giraph.examples.SimpleShortestPathsVertex -vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat > -vip /user/root/input/tiny_graph.txt -of > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > /user/root/output/shortestpaths -w 1 > > I then deleted the output > hadoop fs -rm -R /user/root/output/shortestpaths > > I then restarted my HBase daemons, and ran the giraph example again, and > it worked successfully again, > no errors, no exceptions, no tasks failed, and output produced correctly. > > Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on > port 22181. > > Thank you very much for your help :-) > > Ken > > > ------------------------------ > From: claudio.marte...@gmail.com > Date: Wed, 4 Sep 2013 19:21:37 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing > zookeeper is not used through the giraph.zkServerList parameter, with its > own configuration listening on port 22181. > > > On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams <zoo9...@hotmail.com> wrote: > > Hmmmmmmmm. Interesting. > > Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ? > > The only version of ZooKeeper I have installed is the one that came with > HBase, > and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies > clientPort=2181 > This is the only zoo.cfg file on my machine. > > > [root@localhost]# cat /etc/zookeeper/conf/zoo.cfg > .... > maxClientCnxns=50 > # The number of milliseconds of each tick > tickTime=2000 > # The number of ticks that the initial > # synchronization phase can take > initLimit=10 > # The number of ticks that can pass between > # sending a request and getting an acknowledgement > syncLimit=5 > # the directory where the snapshot is stored. > dataDir=/var/lib/zookeeper > # the port at which the clients will connect > clientPort=2181 > server.1=localhost:2888:3888 > [root@localhost Downloads]# > > > > ------------------------------ > From: claudio.marte...@gmail.com > Date: Wed, 4 Sep 2013 12:13:50 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > That should in principle not be the case, as the zookeeper started by > Giraph listens on a different port than the default. See > parameter giraph.zkServerPort, which defaults to 22181. > > > On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams <zoo9...@hotmail.com> wrote: > > Hi Claudio, > > I think I have fixed the problem. > > HBase runs with its own copy of ZooKeeper which listens on port 2181. > So, when I tried to start ZooKeeper for Giraph it also tried to listen > on port 2181 > and found it was already in use, and then it terminated - which is why > Giraph failed. > If I stop the HBase daemons (including its copy of ZooKeeper) then > Giraph runs fine. > > Essentially there is a conflict between running ZooKeeper for Giraph, > if there is > already ZooKeeper running for HBase. > > I will try the patch and get back to you. > > Thanks for all your help, > > Ken > > ------------------------------ > From: claudio.marte...@gmail.com > Date: Tue, 3 Sep 2013 17:01:01 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > try with the attached patch applied to trunk, without the mentioned -D > giraph.zkManagerDirectory. > > > On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams <zoo9...@hotmail.com> wrote: > > Hi Claudio, > > I tried this but it made no difference. The map tasks still fail, > still no output, and still an > exception in the log files - FileNotFoundException: File > /tmp/giraph/_zkServer does not exist. > > [root@localhost giraph]# hadoop jar > /usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > -Dgiraph.zkManagerDirectory='/tmp/giraph/' > org.apache.giraph.examples.SimpleShortestPathsVertex -vif > org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat > -vip /user/root/input/tiny_graph.txt -of > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > /user/root/output/shortestpaths -w 1 > 13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format > specified. Ensure your InputFormat does not require one. > 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format > vertex index type is not known > 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format > vertex value type is not known > 13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format > edge value type is not known > 13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled > (default), do not allow any task retries (setting mapred.map.max.attempts = > 0, old value = 4) > 13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 13/09/03 14:20:01 INFO mapred.JobClient: Running job: job_201308291126_0039 > 13/09/03 14:20:02 INFO mapred.JobClient: map 0% reduce 0% > 13/09/03 14:20:12 INFO mapred.JobClient: Job complete: > job_201308291126_0039 > 13/09/03 14:20:12 INFO mapred.JobClient: Counters: 6 > 13/09/03 14:20:12 INFO mapred.JobClient: Job Counters > 13/09/03 14:20:12 INFO mapred.JobClient: Failed map tasks=1 > 13/09/03 14:20:12 INFO mapred.JobClient: Launched map tasks=2 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps > in occupied slots (ms)=16327 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all > reduces in occupied slots (ms)=0 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > [root@localhost giraph]# > > > When I try to run Zookeeper it still gives me an 'Address already in use' > exception. > > [root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh > start-foreground > JMX enabled by default > Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg > 2013-09-03 14:23:37,882 [myid:] - INFO [main:QuorumPeerConfig@101] - > Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg > 2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] - > Invalid configuration, only one server specified (ignoring) > 2013-09-03 14:23:37,889 [myid:] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2013-09-03 14:23:37,889 [myid:] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 0 > 2013-09-03 14:23:37,890 [myid:] - INFO [main:DatadirCleanupManager@101] > - Purge task is not scheduled. > 2013-09-03 14:23:37,890 [myid:] - WARN [main:QuorumPeerMain@118] - > Either no config or no quorum defined in config, running in standalone mode > 2013-09-03 14:23:37,904 [myid:] - INFO [main:QuorumPeerConfig@101] - > Reading configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg > 2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] - > Invalid configuration, only one server specified (ignoring) > 2013-09-03 14:23:37,905 [myid:] - INFO [main:ZooKeeperServerMain@100] - > Starting server > 2013-09-03 14:23:37,920 [myid:] - INFO [main:Environment@100] - Server > environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 > GMT > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:host.name=localhost.localdomain > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.version=1.6.0_31 > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.vendor=Sun Microsystems Inc. > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.home=/usr/java/jdk1.6.0_31/jre > 2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server > environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/usr/lib/zookeeper/bin/../conf: > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:java.library.path=/usr/java/jdk1.6.0_31/jre/lib/i386/client:/usr/java/jdk1.6.0_31/jre/lib/i386:/usr/java/jdk1.6.0_31/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:java.io.tmpdir=/tmp > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:java.compiler=<NA> > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:os.name=Linux > 2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server > environment:os.arch=i386 > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:os.version=2.6.32-279.14.1.el6.i686 > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:user.name=root > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:user.home=/root > 2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server > environment:user.dir=/usr/local/giraph-1.0.0 > 2013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@726] - > tickTime set to 2000 > 2013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@735] - > minSessionTimeout set to -1 > 2013-09-03 14:23:37,935 [myid:] - INFO [main:ZooKeeperServer@744] - > maxSessionTimeout set to -1 > 2013-09-03 14:23:37,970 [myid:] - INFO [main:NIOServerCnxnFactory@99] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2013-09-03 14:23:37,972 [myid:] - ERROR [main:ZooKeeperServerMain@68] - > Unexpected exception, exiting abnormally > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91) > at > org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79) > [root@localhost giraph]# > > > Thank you for any help, > > Ken > > > > > ------------------------------ > From: claudio.marte...@gmail.com > Date: Tue, 3 Sep 2013 12:43:59 +0200 > > Subject: Re: FileNotFoundException: File > _bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist. > To: user@giraph.apache.org > > > can you try defining the zookeeper manager directory from the command > line? like this -D giraph.zkManagerDirectory=/path/in/hdfs/foobar > > you'll have to delete this directory by hand before each job. Just to see > if it solves the problem. Then I could know how to fix it. > > > On Tue, Sep 3, 2013 at 12:32 PM, Ken Williams <zoo9...@hotmail.com> wrote: > > Hi Pradeep, > > Yes, the zookeeper server is definitely running, I can connect to it with > the > command-line client > > [root@localhost giraph]# zkCli.sh -server 127.0.0.1:2181 > Connecting to 127.0.0.1:2181 > 2013-09-03 11:15:45,987 [myid:] - INFO [main:Environment@100] - Client > environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 > GMT > 2013-09-03 11:15:45,990 [myid:] - INFO [main:Environment@100] - Client > environment:host.name=localhost.localdomain > 2013-09-03 11:15:45,990 [myid:] - INFO [main:Environment@100] - Client > environment:java.version=1.6.0_31 > ...... > WatchedEvent state:SyncConnected type:None path:null > [zk: 127.0.0.1:2181(CONNECTED) 0] ls / > [hbase, zookeeper] > [zk: 127.0.0.1:2181(CONNECTED) 1] > > > However, I am a bit confused. > If I look in the zookeeper log-file I see this port 2181 'Address already > in use' error, > > 2013-09-03 10:52:24,412 [myid:] - INFO [main:ZooKeeperServer@735] - > minSessionTimeout set to -1 > 2013-09-03 10:52:24,413 [myid:] - INFO [main:ZooKeeperServer@744] - > maxSessionTimeout set to -1 > 2013-09-03 10:52:24,436 [myid:] - INFO [main:NIOServerCnxnFactory@99] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] - > Unexpected exception, exiting abnormally > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91) > > The process listening on port 2181 is 2892, which turns out to be HBase. > > [root@localhost giraph]# fuser 2181/tcp > 2181/tcp: 2892 > [root@localhost giraph]# ps aux | grep 2892 > hbase 2892 0.1 3.2 719592 119624 ? Sl Aug29 7:35 > /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx500m > -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log > -Dhbase.home.dir=/usr/lib/hbase/bin/.. > ...... > > So I am not sure what my zookeeper client is connecting to. > It seems to be connecting to a zookeeper server but when I do 'ps' I > cannot see > a zookeeper server running. > Here is my zoo.cfg file, > > maxClientCnxns=50 > # The number of milliseconds of each tick > tickTime=2000 > # The number of ticks that the initial synchronization phase can take > initLimit=10 > # The number of ticks that can pass between > # sending a request and getting an acknowledgement > syncLimit=5 > # the directory where the snapshot is stored. > dataDir=/var/lib/zookeeper > # the port at which the clients will connect > clientPort=2181 > server.1=localhost:2888:3888 > > Thanks for any help, > > Ken > > > > -- > Claudio Martella > claudio.marte...@gmail.com > > > > > -- > Claudio Martella > claudio.marte...@gmail.com > > > > > -- > Claudio Martella > claudio.marte...@gmail.com > > > > > -- > Claudio Martella > claudio.marte...@gmail.com > -- Claudio Martella claudio.marte...@gmail.com