Re: Question regarding data location in hdfs after hbase restarts

2010-10-11 Thread Ryan Rawson
We don't attempt to optimize region placement with hdfs locations yet. A reason why is because on a long lived cluster compactions create the locality you are looking for. Furthermore, in the old master such an optimization was really hard to do. The new master should make it easier to write such 1

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Ryan Rawson
Yes this is spot on. When hbase scans we read a block, iterate through the keys in the block then goes to the next block. We try to be as efficient as possible, but the inescapable fact remains we must read all the intervening data. We can do tricks (in 0.90) to use the block index to skip some blo

Increase region server throughput

2010-10-11 Thread Venkatesh
I would like to tune region server to increase throughput..On a 10 node cluster, I'm getting 5 sec per put. (this is unbatched/unbuffered). Other than region server handler count property is there anything else I can tune to increase throughput? ( this operation i can't use buffered write wit

Question regarding data location in hdfs after hbase restarts

2010-10-11 Thread Tao Xie
hi, all I set hdfs replica=1 when running hbase. And DN and RS co-exists on each slave node. So the data in the regions managed by RS will be stored on its local data node, rite? But when I restart hbase and hbase client does gets on RS, datanode will read data from remote data nodes. Does that mea

Re: Bulk import tools for HBase

2010-10-11 Thread Sean Bigdatafun
Another potential "problem" of incremental bulk loader is that the number of reducers (for the bulk loading process) needs to be equal to the existing regions -- this seems to be unfeasible for very large table, say with 2000 regions. Any comment on this? Thanks. Sean On Fri, Oct 8, 2010 at 9:03

Re: HBase cluster with heterogeneous resources

2010-10-11 Thread Sean Bigdatafun
On Sun, Oct 10, 2010 at 12:28 PM, Abhijit Pol wrote: > Thanks Stack. > > I think we have GC under control. We have CMS tunned to start early and > don't see slept x longer y in logs anymore. We also have higher zk timeout > (150 seconds), guess can bump that up a bit. > > I was able to point to s

HLog and durability question --0.90 and 0.20

2010-10-11 Thread Sean Bigdatafun
Can someone give me a detailed look at the HLog mechanism for 0.90 durablity? I recall that HBase committers claim that data will be truly durable in 0.90 after the client gets 'ok' acknowledgement from server, while it was not true in 0.20 (i.e., HBase may have the chance to lose the data even it

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Sean Bigdatafun
I think this is a good suggestion too. HBase linearly scans through the 64KB that is bring to memory. If big data payload (yet unused in a query/scan) is mixed with small data payload, it will be rather ineffective, I think? On Mon, Oct 11, 2010 at 9:43 AM, Ryan Rawson wrote: > The reason I tal

Re: Hbase rollback..

2010-10-11 Thread Ryan Rawson
That is correct. But we are confident with the new durability changes and other things 0.90 will be safer and faster than 0.20.6. On Oct 11, 2010 4:51 PM, "Sean Bigdatafun" wrote: > Thanks for clarifying this. > > But on the other hand, wow... that means that even I like the consistency > enhance

Re: Hbase rollback..

2010-10-11 Thread Sean Bigdatafun
Thanks for clarifying this. But on the other hand, wow... that means that even I like the consistency enhancement in 0.90, I can not enjoy it if I have started using HBase 0.20 on a production? On Thu, Sep 16, 2010 at 10:49 PM, Stack wrote: > On Thu, Sep 16, 2010 at 10:22 PM, Todd Lipcon w

HBase 0.89.20100726 with unmanaged zookeeper fails to start

2010-10-11 Thread Charles Thayer
We're using a pre-existing zookeeper cluster (HBASE_MANAGES_ZK=false), and trying to port some code from 0.20 to 0.89, but hbase fails to start with Couldnt start ZK at requested address of 2181 [..blah..] 2182 (from ./src/main/java/org/apache/hadoop/hbase/master/HMaster.java) Because port

Re: hbase.client.retries.number

2010-10-11 Thread Venkatesh
BTW..get this exception while trying a new put..& Also, get this exception on gets on some region servers org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1

hbase.client.retries.number

2010-10-11 Thread Venkatesh
HBase was seamless for first couple of weeks..now all kinds of issues in production :) fun fun.. Curious ..does this property have to match up on "hbase client side" & region server side.. I've this number set to 0 on region server side & default on client side.. I can't do any put (new) t

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Ryan Rawson
The reason I talk about value size is one area where multiple families are good is when you have really large values in one column and smaller values in different columns. So if you want to just read the small values without scanning through the big values you can use separate column families. -ry

RE: question about region files

2010-10-11 Thread Gibbon, Robert, VF-Group
This smells of garbage and low memory. See for ref a similar problem report here - http://kr.forums.oracle.com/forums/thread.jspa?messageID=2146733 How many rest servers do you have loading all of that data? AFAIK they're stateless and loadbalancable # "Gang worker#0 (Parallel GC Threads)" pr

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Jean-Daniel Cryans
> Yes. I agree. OOME unlikely. I misinterpreted my current problem. > I found, that this (gc timeout) on my 0.89-stumpbleupon hbase occurs > only if writeToWAL=false. My RS eats all available memory (5GB), but > don't get OOME. I try ti figure out what is going on. Long GC pauses happens for many

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Andrey Stepachev
2010/10/11 Jean-Daniel Cryans : > On Mon, Oct 11, 2010 at 4:20 AM, Andrey Stepachev wrote: >> Hi. >> Yes. I agree. OOME unlikely. I misinterpreted my current problem. I found, that this (gc timeout) on my 0.89-stumpbleupon hbase occurs only if writeToWAL=false. My RS eats all available memory (5G

Re: Region servers suddenly disappearing

2010-10-11 Thread Jean-Daniel Cryans
No idea, the reason it died is higher in the log. Look for a message like "Dumping metrics" and the reason should be just a few lines higher than that. J-D On Sun, Oct 10, 2010 at 5:13 PM, Venkatesh wrote: > >  Some of the region servers suddenly dying..I've pasted relevant log lines..I > don't

Re: Hbase internally row location mechanism

2010-10-11 Thread Jean-Daniel Cryans
Section 5.1 of the Bigtable paper gives a pretty good explanation: http://labs.google.com/papers/bigtable.html In HBase, Chubby is replaced by ZooKeeper, root tablet by the -ROOT- table, and METADATA tablets by the .META. table. J-D On Sun, Oct 10, 2010 at 10:54 PM, William Kang wrote: > Hi, >

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Jean-Daniel Cryans
On Mon, Oct 11, 2010 at 4:20 AM, Andrey Stepachev wrote: > Hi. > > One additional issue with column families: number of memstores. Each > family on insert utilizies > one memstory. If you'll write in several memstores at onces you get > more memstores and more > memory will be used by you region s

Re: StarGate HTTP ERROR: 404

2010-10-11 Thread Andrew Purtell
Hi Fleming, First, Sanel is correct, whatever you are attempting to use is not Stargate. Kindly follow the rest of the advice. > HBase 20.2 You should be using HBase 0.20.6. We can't help muchwith problems with 0.20.2 any more -- in just about all cases the first advice will be to upgrade to 0.

Re: StarGate HTTP ERROR: 404

2010-10-11 Thread Sanel Zukan
Actually, this is not Stargate, but older REST service that was deprecated. To activate Stargate, copy $HBASE_HOME/contrib/stargate/* and $HBASE_HOME/contrib/stargate/lib/* to hbase lib directory ($HBASE_HOME/lib) and start it with: $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.stargate.Main Now

Re: Number of column families vs Number of column family qualifiers

2010-10-11 Thread Andrey Stepachev
Hi. One additional issue with column families: number of memstores. Each family on insert utilizies one memstory. If you'll write in several memstores at onces you get more memstores and more memory will be used by you region server. Especially with random inserts you can easy get gc timeouts or O