unstable secure zookeeper

2012-02-17 Thread Francis Liu
This see my message popup in the list. Resending Hi, I have 0.92-security installed. I'm hitting intermittent problems starting the regionservers because of intermittent zookeeper connection failures. Because of this not all my region servers startup after "start regionservers". This also som

HBase 0.90.4: YCSB gets NPE with large loads when timeseries is set

2012-02-17 Thread Jean T. Anderson
Hi, everyone, I often get an NPE during the YCSB load process if timeseries is set and I'm loading lots of records. Here's the syntax I'm using: YCSB Client 0.1 Command line: -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=1000 -th

Re: Max region file size

2012-02-17 Thread Bryan Keller
Actually, I had pre-created too many regions so most were only half full, I had a couple of regions that were 4gb. On Feb 17, 2012, at 3:48 PM, Bryan Keller wrote: > Is the max region file size the size of the data uncompressed or the max size > of the store file? I noticed my store files are ~

Max region file size

2012-02-17 Thread Bryan Keller
Is the max region file size the size of the data uncompressed or the max size of the store file? I noticed my store files are ~2.1 gb though I have the max region size set to 4 gb. This is after a major compaction. Also, is the max region size 4 gb in HBase 0.90.4 or can it be larger? The docs s

Re: Slow Get performance, is there a way to profile a Get?

2012-02-17 Thread Stack
On Fri, Feb 17, 2012 at 2:38 PM, Jeff Whiting wrote: > Is there way to profile a specific get request to see where the time is > spent (e.g. checking memstore, reading from hdfs, etc)? > > We are running into a problem where a get after a delete goes really slow. >  We have a row that has between

Slow Get performance, is there a way to profile a Get?

2012-02-17 Thread Jeff Whiting
Is there way to profile a specific get request to see where the time is spent (e.g. checking memstore, reading from hdfs, etc)? We are running into a problem where a get after a delete goes really slow. We have a row that has between 100 to 256 MB of data in it across a couple hundred columns.

Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?

2012-02-17 Thread Jean-Daniel Cryans
The gist of the answer is that, unlike random reads, the blocks we read sequentially from the fs are wholly consumed so you end up doing less fs calls thus the total proportion of the time spent talking to datanodes is lessened (which is what local reads help). Also the dfs client keeps a block re

Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?

2012-02-17 Thread Bryan Keller
I was thinking (wrongly it seems) that having the region server read directly from the local file system would be faster than going through the data node, even with sequential access. On Feb 17, 2012, at 1:28 PM, Jean-Daniel Cryans wrote: > On Fri, Feb 17, 2012 at 1:21 PM, Bryan Keller wrote:

Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?

2012-02-17 Thread Jean-Daniel Cryans
On Fri, Feb 17, 2012 at 1:21 PM, Bryan Keller wrote: > I have been experimenting with local reads. For me, enabling did not help > improve read performance at all, I get the same performance either way. I can > see in the data node logs it is passing back the local path, so it is enabled > prop

Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?

2012-02-17 Thread Bryan Keller
I have been experimenting with local reads. For me, enabling did not help improve read performance at all, I get the same performance either way. I can see in the data node logs it is passing back the local path, so it is enabled properly. Perhaps the benefits of local reads are dependent on th

Re: How to define a custom filter to skip some amount of rows?

2012-02-17 Thread Dmitriy Lyubimov
Filters support re-seek functionality. Hint can be given on the point of re-seek. Re-seek can be applied to either columns or rows (basically, hint is the new target of re-seek as key-value). You can find details in the hbase book (and perhaps somewhere online, too). -d On Mon, Feb 13, 2012 at 5:

Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries

2012-02-17 Thread Bing Li
Yes, I noticed that. But it missed something I mentioned in my previous email. Thanks, Bing On Sat, Feb 18, 2012 at 12:11 AM, Stack wrote: > The next page is on pseudo-distributed: > http://hbase.apache.org/book/standalone_dist.html#distributed > > St.Ack > > On Fri, Feb 17, 2012 at 7:18 AM, Bi

Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries

2012-02-17 Thread Bing Li
Stack, The link just describes a standalone mode for HBase. If possible, I think a pseudo-distributed mode is also preferred. Thanks, Bing On Fri, Feb 17, 2012 at 11:10 PM, Stack wrote: > On Thu, Feb 16, 2012 at 11:03 PM, Bing Li wrote: > > I just made summary about the experiences to set up

Re: ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries

2012-02-17 Thread Stack
On Thu, Feb 16, 2012 at 11:03 PM, Bing Li wrote: > I just made summary about the experiences to set up a pseudo-distributed > mode HBase. > Thank you for the writeup. What would you have us change in here: http://hbase.apache.org/book/quickstart.html? Thanks, St.Ack