Re: Question about HBase

2009-07-26 Thread Ryan Rawson
it should be possible... the bottlenecks will become things like log splitting, region management and the master/regionserver comm channel issues. these are all up for fixinating in 0.21. How big are your datums? If they are fairly large, it might make more sense to store the raw data on HDFS, a

Re: Question about HBase

2009-07-26 Thread Schubert Zhang
Thanks J-G and Ryan, we are trying to use 0.20.0 now. It is time-consuming, since there is no document now, but we can continue the work. :-) And another general question: Do you think it is possible to store and serve 200TB of data (uncompressed, maybe 50TB after compressed) in a 20-nodes cluste

Re: Question about HBase

2009-07-20 Thread Jonathan Gray
Schubert, Sounds like you know what you're doing. There are two different LRU implementations in current trunk, but they do more than you need them to. Both act on objects that implement HeapSize, so they are a heapsize-bound LRU. May not be that bad of an idea. The algorithm is universal

Re: Question about HBase

2009-07-20 Thread zsongbo
J-G, Thanks for your reply. You really understand my question. I am planning to experiment the partitioned-tables (day by day partition). Yes, it will let our query application become complex. But there is a award that let it easy to delete the old data, and easy to run mapreduce jobs which only

Re: Question about HBase

2009-07-20 Thread Jonathan Gray
You're right about the conflict between randomized keys or time-ordered keys. Getting the best load distribution vs isolating regions being written to. There's a number of different ways you could deal with this, some being fairly complex (you could partition tables by time). You could add a

Re: Question about HBase

2009-07-20 Thread zsongbo
Ryan, Thanks, maybe my previous email did not describe clearly. We really know that the 'memcache'/'memstor' is a write buffer. The read operation will not need such a cache. :-) So according to your answer "strict limiting factor is the index size.", I am considering to use the 'softReference' t

Re: Question about HBase

2009-07-20 Thread Ryan Rawson
I think you might be misunderstanding what the 'memcache' is, we are calling it 'memstore' now. It is a write buffer, not a cache. It is also memory sensitive, so as you insert more data, hbase will flush the 'memcache' to HDFS. By default memcache is limited to 64MB a store, 40% of Xmx, and we a

Re: Question about HBase

2009-07-20 Thread zsongbo
Ryan, I know we can store more than 250GB in one region ssrver. But how about 3TB, even 10TB. Except for the memory usage by indexs, there may have other factors, such as the memcache. If there are 5000 regions opened, the total memcache heap will be very large. So, I am thinking two: 1. What is

Re: Question about HBase

2009-07-10 Thread Ryan Rawson
By dedicating more ram to the situation you can achieve more regions under a single regionserver. I have noticed that in my own region servers, 200-600MB = 1-2MB of index. This value, however, is dependent on the size of your keys and values. I have very small keys and values. You can also tune

Re: Question about HBase

2009-07-10 Thread zsongbo
Ryan, Yes. you are right. But my question is that, even through 1000 regions (250MB)) per regionserver, each regionserver can only support 250GB storage. Please also check this thread "Help needed - Adding HBase to architecture", Stack and Andrew have put some talk there. Schubert On Fri, Jul 1

Re: Question about HBase

2009-07-09 Thread Ryan Rawson
That size is not memory-resident, so the total data size is not an issue. The index size is what limits you with RAM, and its about 1 MB per region (256MB region). -ryan On Thu, Jul 9, 2009 at 9:51 PM, zsongbo wrote: > Hi Ryan, > > Thanks. > > If your regionsize is about 250MB, than 400 regions

Re: Question about HBase

2009-07-09 Thread zsongbo
Hi Ryan, Thanks. If your regionsize is about 250MB, than 400 regions can store 100GB data on each regionserver. Now, if you have 100TB data, then you need 1000 regionservers. We are not google or yahoo who have so many nodes. Schubert On Fri, Jul 10, 2009 at 12:29 PM, Ryan Rawson wrote: > re:

Re: Question about HBase

2009-07-09 Thread Ryan Rawson
re: #2: in fact we don't know that... I know that I ran run 200-400 regions on a regionserver with a heap size of 4-5gb. More even. I bet I could have 1000 regions open on 4gb ram. Each region is ~ 1mb of all the time data, so there we go. As for compactions, they are fairly fast, 0-30s or so d

Question about HBase

2009-07-09 Thread zsongbo
Hi all, 1. In this configuration property: hbase.hstore.compactionThreshold 3 If more than this number of HStoreFiles in any one HStore (one HStoreFile is written per flush of memcache) then a compaction is run to rewrite all HStoreFiles files as one. Larger numbers

Re: Some question about HBase

2008-07-25 Thread Jean-Daniel Cryans
Xin, Comments inline. Regards, J-D On Tue, Jul 22, 2008 at 2:28 AM, Xin Jing <[EMAIL PROTECTED]> wrote: > Hi, > > I am a new user of HBase, I am curious about the inert process of HBase. > Could you please explain it in details? > > The question is: when I created a table (only one column, to

Some question about HBase

2008-07-24 Thread Xin Jing
Hi, I am a new user of HBase, I am curious about the inert process of HBase. Could you please explain it in details? The question is: when I created a table (only one column, to make it easy to describe), and insert a huge amount of data into the table. I know it is a B-Tree like storage struc