Re: HBase High Availability

2009-11-25 Thread Jean-Daniel Cryans
Your question implies a lot of other considerations so I'll answer strictly to it. By default we retry 10 times to get access to data with pauses of increments of 2 seconds in the same fashion of the BSD TCP syn backoff table. The configs are: hbase.client.pause 2000 General client

Re: HBase High Availability

2009-11-25 Thread Murali Krishna. P
Thanks JD for the detailed reply. Does the underlying java api currently block in case if region is not available ? I would like to get an immediate retry indication for the java call in such cases so that I can redirect the request to the duplicate table in the other data center. Can this be s

Re: Way to Specify HBase master?

2009-11-25 Thread Jean-Daniel Cryans
This would be a good FAQ entry Andrew, your explanations are way better than mine. The master publishing itself is new in 0.20 and I guess others will hit this issue with Ubuntu. J-D On Tue, Nov 24, 2009 at 2:03 PM, Andrew Purtell wrote: > Good to hear Mark. > > HBase is sensitive to this in a w

Re: HBase High Availability

2009-11-25 Thread Jean-Daniel Cryans
> I have a hbase table which created via mapred tool, took almost 1 hour to > load by the loadtable.rb script and to be available for serving. The scale > was 8k regions per server. I am on 0.20.1, r822817 though. I am yet to test > the failure case, but it will take around 1hour/ no.of RS to

Re: hbase-test

2009-11-25 Thread Jean-Daniel Cryans
That error means the Namenode isn't giving the locations for a file in the namespace, I don't think this is the error you are searching for. Which tests failed? Which version? Is it because of errors or because of failed assertions? Look at the end of the TEST-* files to figure it. J-D On Wed, N

Re: HBase High Availability

2009-11-25 Thread Andrew Purtell
First, there is work under way for 0.21 which will shorten the time necessary for region redeployment. Part of the delay in 0.20 is less than ideal performance in that regard by the master. Beyond that, just as a general operational principle, I recommend that you host no more than 200-250 reg

Re: How to flush memstore to filestore ?

2009-11-25 Thread Andrew Purtell
Forgive what might be a dumb question, but did you change your HBase rootdir to something other than the default (which is in /tmp)? In the HBase shell there is a command 'flush' which will force persistence of anything in memstore, but this should not be needed in normal operation unless you

Re: is there any problem with our environment?

2009-11-25 Thread Andrew Purtell
Try this: Do not cache on the crawlers, just write through. Run each region server with plenty of heap (4 GB to start). So it seems you need more RAM on your systems, or you should move your crawlers off to separate servers to free up RAM and CPU. Adjust your HBase site config as follows:

FYI: ZooKeeper performance on (smaller) hardware classes

2009-11-25 Thread Patrick Hunt
Hi, I'm Patrick from the ZooKeeper team. Recently we've (the ZK dev community) been working more closely with the HBase dev team. In particular to ensure that ZK-HBase interaction is the best it can be and improve things where it's not. We are also looking at how HBase might take more advantage

Re: question about compound keys with two/multiple strings

2009-11-25 Thread Dave Latham
I'm not sure I understand your requirements entirely, but there are order insensitive functions you can use to generate a key for any pair of elements. Any commutative operation on the two keys would work, but if you use a hash you need to worry about collisions. If you take your two (or more) ke

RE: hbase-test

2009-11-25 Thread Mark Vigeant
I just ran the "ant test" and a bunch of the tests failed. Upon looking at the logs, this one error came up: Could not get block locations. Source file "/user/hadoop/hbase.version" - Aborting... java.io.IOException: Could not get block locations. Source file "/user/hadoop/hbase.version" - Abort

Re: HBase High Availability

2009-11-25 Thread Imran M Yousuf
Hi, I was just wondering whether Linux HA (HeartBeat + DRDB) could be an option in this case. What you guys think? - Imran On Wed, Nov 25, 2009 at 6:29 PM, Murali Krishna. P wrote: > Hi Ryan, >  Thanks for the quick response. > >    We are planing have this in 2 or 3 data centers for BCP and la

Re: HBase High Availability

2009-11-25 Thread Murali Krishna. P
Hi Ryan, Thanks for the quick response. We are planing have this in 2 or 3 data centers for BCP and latency reasons. Currently application runs in a non-scalable cluster, essentially we have the data partitioned across multiple fixed columns. The entire cluster of machines can be conside

Re: HBase High Availability

2009-11-25 Thread Ryan Rawson
With multiple masters, the election is mediated by zookeeper and the idle masters are awaiting the relection cycle. The problems with brining regions up after a failure isnt the actual speed of loading them, but bugs with the master. This is being fixed in 0.21. It will allow us to much more rapi