Hi,

We are still experiencing 40-60 minutes of task failure before our
HBaseStorage jobs run but we think we've narrowed the problem down to a
specific zookeeper issue.

The HBaseStorage map task only works when it lands on a machine that
actually is running zookeeper server as part of the quorum. It typically
attempts from several different nodes in the cluster, failing repeatedly
before it hits on a zookeeper node.

Logs show the failing task attempts are trying to connect to the localhost
machine on port 2181 to make a ZooKeeper connection (as part of the
Load/HBaseStorage map task):

...
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
...
> java.net.ConnectException: Connection refused
...

This explains why the job succeeds eventually, as we have a zookeeper quorum
server running on one of our worker nodes, but not on the other 3.
Therefore, the job fails repeatedly until it is redistributed onto the node
with the ZK server, at which point it succeeds immediately.

We therefore suspect the issue is in our ZK configuration. Our
hbase-site.xml defines the zookeeper quorum as follows:

    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>namenode,jobtracker,slave0</value>
    </property>

Therefore, we would expect the tasks to connect to one of those hosts when
attempting a zookeeper connection, however it appears to be attempting to
connect to "localhost" (which is the default). It is as if the hbase
configuration settings here are not used.

Does anyone have any suggestions as to what might be the cause of this
behaviour?

Sending this to both lists although it is only Pig HBaseStorage jobs that
suffer this problem on our cluster. HBase Java client jobs work normally.

Thanks,
Royston

-----Original Message-----
From: Subir S [mailto:subir.sasiku...@gmail.com] 
Sent: 24 April 2012 13:29
To: u...@pig.apache.org; user@hbase.apache.org
Subject: Re: HBaseStorage not working

Looping HBase group.

On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
royston.sell...@googlemail.com> wrote:

> We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
>
> The script below runs fine in a few seconds using Pig in local mode 
> but with Pig in MR mode it sometimes works rapidly but usually takes 
> 40 minutes to an hour.
>
> --hbaseuploadtest.pig
> register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' 
> ) AS (mid : chararray, hid : chararray, mf : chararray, mt : chararray,
mind :
> chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE 
> raw_data INTO 'hbase://hbaseuploadtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf 
> info:mt info:mind info:mimd info:mst);
>
> i.e.
> [hadoop1@namenode hadoop-1.0.2]$ pig -x local 
> ../pig-scripts/hbaseuploadtest.pig
> WORKS EVERY TIME!!
> But
> [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce 
> ../pig-scripts/hbaseuploadtest.pig
> Sometimes (but rarely) runs in under a minute, often takes more than 
> 40 minutes to get to 50% but then completes to 100% in seconds. The 
> dataset is very small.
>
> Note that the dump of raw_data works in both cases. However the STORE 
> command causes the MR job to stall and the job setup task shows the 
> following errors:
> Task attempt_201204240854_0006_m_000002_0 failed to report status for 
> 602 seconds. Killing!
> Task attempt_201204240854_0006_m_000002_1 failed to report status for 
> 601 seconds. Killing!
>
> And task log shows the following stream of errors:
>
> 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: 
> Initiating client connection, connectString=localhost:2181 
> sessionTimeout=180000 watcher=hconnection 0x5567d7fb
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,443 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:27,443 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> IO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,445 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier 
> of this process is 6846@slave2
> 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,552 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:27,552 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> IO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,553 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly 
> transient ZooKeeper exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 2000ms before retry #1...
> 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server localhost/127.0.0.1:2181
> 2012-04-24 11:57:28,653 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:28,653 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused etc etc
>
> Any ideas? Anyone else out there successfully running Pig 0.11
> HBaseStorage() against HBase 0.95?
>
> Thanks,
> Royston
>
>
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> Sent: 20 April 2012 00:03
> To: u...@pig.apache.org
> Subject: Re: HBaseStorage not working
>
> Nothing significant changed in Pig trunk, so I am guessing HBase 
> changed something; you are more likely to get help from them (they 
> should at least be able to point at APIs that changed and are likely 
> to cause this sort of thing).
>
> You might also want to check if any of the started MR jobs have 
> anything interesting in their task logs.
>
> D
>
> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman 
> <royston.sell...@googlemail.com> wrote:
> > Does HBaseStorage work with HBase 0.95?
> >
> >
> >
> > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > 0.95 and Pig 0.11 (built from source):
> >
> >
> >
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> >
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> >
> >
> >
> >
> >
> > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) 
> > AS (
> >
> >      ID:chararray,
> >
> >      hp:chararray,
> >
> >      pf:chararray,
> >
> >      gz:chararray,
> >
> >      hid:chararray,
> >
> >      hst:chararray,
> >
> >      mgz:chararray,
> >
> >      gg:chararray,
> >
> >      epc:chararray );
> >
> >
> >
> > STORE tbl1 INTO 'hbase://sse.tbl1'
> >
> > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> >
> >
> >
> > The job output (using either Grunt or PigServer makes no difference) 
> > shows the family:descriptors being added by HBaseStorage then starts 
> > up the MR job which (after a long pause) reports:
> >
> > ------------
> >
> > Input(s):
> >
> > Failed to read data from
> > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> >
> >
> >
> > Output(s):
> >
> > Failed to produce result in "hbase://sse.tbl1"
> >
> >
> >
> >
> >
> > INFO mapReduceLayer.MapReduceLauncher: Failed!
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hp
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:pf
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:gz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hid
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hst
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:mgz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:gg
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:epc
> >
> > ------------
> >
> >
> >
> > The "Failed to read" is misleading I think because dump tbl1; in 
> > place of the store works fine.
> >
> >
> >
> > I get nothing in the HBase logs and nothing in the Pig log.
> >
> >
> >
> > HBase works fine from the shell and can read and write to the table.
> > Pig works fine in and out of HDFS on CSVs.
> >
> >
> >
> > Any ideas?
> >
> >
> >
> > Royston
> >
> >
> >
>
>

Reply via email to