Re: Limits on HBase

2010-09-06 Thread Himanshu Vashishtha
Assuming you will be using hdfs as the file system: wouldn't saving those large objects in the fs and keeping a pointer to them in a hbase table serve the purpose. [I haven't done it myself but I can't see it not working. In fact, I remember reading it somewhere in the list.] ~Himanshu On Mon, S

Re: question about RegionManager

2010-09-06 Thread Tao Xie
But when I directly load data into HDFS using HDFS API, the disks are balanced. I use hadoop-0.20.2. 2010/9/7 Todd Lipcon > On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray wrote: > > > You're looking at sizes on disk? Then this has nothing to do with HBase > > load balancing. > > > > HBase does

Re: Limits on HBase

2010-09-06 Thread William Kang
Hi JG, Thanks for your reply. As far as I have read in Hbase's documentation and wiki, the cell size is not supposed to be larger than 10 MB. For the row, I am not quite sure, but it looks like 256 MB is the upper limit. I am considering store some binary data used to be stored in RDBM blob field.

Re: question about RegionManager

2010-09-06 Thread Todd Lipcon
On Mon, Sep 6, 2010 at 9:08 PM, Jonathan Gray wrote: > You're looking at sizes on disk? Then this has nothing to do with HBase > load balancing. > > HBase does not move blocks around on the HDFS layer or deal with which > physical disks are used, that is completely the responsibility of HDFS. >

RE: question about RegionManager

2010-09-06 Thread Jonathan Gray
You're looking at sizes on disk? Then this has nothing to do with HBase load balancing. HBase does not move blocks around on the HDFS layer or deal with which physical disks are used, that is completely the responsibility of HDFS. Periodically HBase will perform major compactions on regions wh

Re: question about RegionManager

2010-09-06 Thread Stack
Well spotted! This issue was fixed in 0.20.5. It was "HBASE-2167 Load balancer falls into pathological state if one server under average - slop; endless churn" St.Ack On Mon, Sep 6, 2010 at 7:06 PM, Tao Xie wrote: > hi, all > > I'm reading the code of RegionManager, I find in the following m

Re: question about RegionManager

2010-09-06 Thread Tao Xie
Actually, I'm a newbie of HBase. I went to read the code of assigning region because I met a load imbalance problem in my hbase cluster. I run 1+6 nodes hbase cluster, 1 node as master & client, the other nodes as region server and data nodes. I run YCSB to insert records. In the inserting time, I

Re: question about RegionManager

2010-09-06 Thread Tao Xie
I have a look at the following method in 0.89. Is the the following line correct ? nRegions *= e.getValue().size(); private int regionsToGiveOtherServers(final int numUnassignedRegions, final HServerLoad thisServersLoad) { SortedMap> lightServers = new TreeMap>(); this.master.g

RE: question about RegionManager

2010-09-06 Thread Jonathan Gray
That code does actually exist in the latest 0.89 release. It was a protection put in place to guard against a weird behavior that we had seen during load balancing. As Ryan suggests, this code was in need of a rewrite and was just committed last week to trunk/0.90. If you're interested in the

Re: question about RegionManager

2010-09-06 Thread Ryan Rawson
That code was completely rewritten in 0.89/0.90... its pretty dodgy so I'd highly consider upgrading to 0.89 asap. > hi, all > > I'm reading the code of RegionManager, I find in the following method there > is an situation when nRegionsToAssign <= nregions, the code only assigns 1 > region. > Is th

question about RegionManager

2010-09-06 Thread Tao Xie
hi, all I'm reading the code of RegionManager, I find in the following method there is an situation when nRegionsToAssign <= nregions, the code only assigns 1 region. Is this correct? Hbase version 0.20.4. private void assignRegionsToMultipleServers(final HServerLoad thisServersLoad, final S

RE: Limits on HBase

2010-09-06 Thread Jonathan Gray
I'm not sure what you mean by "optimized cell size" or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that leve

Limits on HBase

2010-09-06 Thread William Kang
Hi folks, I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William

Re: HBase table lost on upgrade

2010-09-06 Thread Stack
On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani wrote: > > I also have the Java Api code (for testing purposes) and that gave similar > performance results (520 seconds on dev and 250 on production cluster). Is > there a way to flush the cache before we run the next experiment? I doubt > that th

Re: regionserver skew

2010-09-06 Thread Stack
On Fri, Sep 3, 2010 at 6:22 PM, Sharma, Avani wrote: > I read on the mailing list that the region server that has .META table > handles more requests. That sounds okay, but in my case the 3rd regionserver > has 0 requests! And I feel that's what slowing down the read performance. > Also the hit

Re: Update 2 versions of a cell : same issue as HBASE-1485 ?

2010-09-06 Thread Stack
Thanks for the test Evert. I'd suggest you add it to hbase-1485 as an attachment so it gets included in final patch. St.Ack On Mon, Sep 6, 2010 at 3:05 AM, Evert Arckens wrote: > Here's a unittest demonstrating the use case : > > public class TwoCellUpdatesTest { >   �...@test >    public void te

Re: HBase secondary index performance

2010-09-06 Thread Andrey Stepachev
2010/9/6 Murali Krishna. P : > Hi, >   My row size is around 300 bytes with total 20 columns. I tried the custom > indexing without the write to WAL. Currently having only 2 tables, one for the > main table and another for all 20 indexes. My key to the index table is > columnValue+columnName+rowKey

Re: HBase secondary index performance

2010-09-06 Thread Murali Krishna. P
> Please clarify how this index table serves 20 columns - in the above schema, > columnValue would be different for the 20 columns indexed, I assume. My query to the index table will be columnValue + columnName. This is for exact match, if you need scan on partial value, we have to reverse the ke

Re: HBase secondary index performance

2010-09-06 Thread Ted Yu
> My key to the index table is columnValue+columnName+rowKey. You need to consider the distribution of the above key so that write to index table doesn't become bottleneck in the write path. Please clarify how this index table serves 20 columns - in the above schema, columnValue would be different

Re: Update 2 versions of a cell : same issue as HBASE-1485 ?

2010-09-06 Thread Evert Arckens
Here's a unittest demonstrating the use case : public class TwoCellUpdatesTest { @Test public void testCellUpdates() throws Exception { Configuration configuration = HBaseConfiguration.create(); HBaseTestingUtility hBaseTestingUtility = new HBaseTestingUtility(configuration