Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Ted Dunning
Try iostat or if you are running it, try ganglia. On Mon, May 30, 2011 at 10:07 PM, Harold Lim wrote: > How do I know how much data is moving from the disk? >

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Ted Dunning
It may make it better. We should have an update shortly that will allow multiple machines to participate in generating load. A single YCSB is sufficient to stress a few nodes but once you get to 10 or more (especially with MapR underneath) you really need a cluster to generate the load. The sync

Harvesting empty regions

2011-05-30 Thread Arvind Jayaprakash
My setup seems to have a lot of regions with no data that just keep accumulating over time. Here are some details: I have time-series data (created by opentsdb) being inserted into hbase every minute. Since the data has little value after say 15 days, I go ahead and delete all old data. When I lo

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Harold Lim
Hi Ted, I haven't tried with bigger instances yet. Those are my next steps. I also see that you have a forked version of YCSB, will that make my performance better? Thanks, Harold --- On Tue, 5/31/11, Ted Dunning wrote: > From: Ted Dunning > Subject: Re: How to improve HBase throughput

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Harold Lim
Hi Ted, I read all fields in the record. I was trying to get similar performance from the YCSB paper. How do I know how much data is moving from the disk? -Harold --- On Tue, 5/31/11, Ted Dunning wrote: > From: Ted Dunning > Subject: Re: How to improve HBase throughput with YCSB? > To: user@

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Ted Dunning
What happens if you increase heap space to 8GB on an m1.xlarge or m2.2xlarge? On Mon, May 30, 2011 at 8:50 PM, Harold Lim wrote: > Hi Lohit, > > I'm running HBase 0.90.2. 10 x ec2 m1.large instances. I set the heap size > to 4GB and handler count for hbase, and dfs to 100. I also set the dfs ma

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Ted Dunning
How large are the reads? Have you tried this on a better instance type such as was suggested a bit ago? How much data is moving from the disks? On Mon, May 30, 2011 at 8:46 PM, Harold Lim wrote: > Hi Ted, > > It's a pure random read operation. > > > -Harold > --- On Mon, 5/30/11, Ted Dunning

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Harold Lim
Hi Andrew, Is this a normal behavior in m1.large instances? Does m1.xlarge work? I am using the local storage of the instances (ephemeral disk in EC2 terminology). I picked m1.large because that was the "smallest" type of instance that has a high I/O performance listed. Thanks, Harold ---

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Harold Lim
Hi Lohit, I'm running HBase 0.90.2. 10 x ec2 m1.large instances. I set the heap size to 4GB and handler count for hbase, and dfs to 100. I also set the dfs max xcievers to 4096 I'm running a pure random read YCSB workload. I also tried running multiple clients from multiple ec2 instances, but

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Harold Lim
Hi Ted, It's a pure random read operation. -Harold --- On Mon, 5/30/11, Ted Dunning wrote: > From: Ted Dunning > Subject: Re: How to improve HBase throughput with YCSB? > To: user@hbase.apache.org > Date: Monday, May 30, 2011, 3:07 PM > What kind of operations? > > On Mon, May 30, 2011 at 9:

Re: HRegion.openHRegion IOException caused an endless loop of opening—opening failed

2011-05-30 Thread bijieshan
I have filed an issue, and I'll commit a patch soon(For I still need to do some test on the patch). Issue Address: https://issues.apache.org/jira/browse/HBASE-3937 It indeed has something relating to HBASE-3789 I'm still looking into this issue. Any further discussion, I'll add into comments. Th

Re: bulkloader zookeeper connectString

2011-05-30 Thread Stack
Sounds like -c is a little flakey. Glad you figured it Geoff (eventually). St.Ack On Sat, May 28, 2011 at 12:01 PM, Geoff Hendrey wrote: > Never got the "-c" argument to work, but when I setup the following > environment vars, it was happy: > > export HBASE_HOME > added hbase conf dir to CLASSP

Re: HRegion.openHRegion IOException caused an endless loop of opening—opening failed

2011-05-30 Thread Stack
Thanks for digging in Jean. Your diagnosis below looks right to me -- the bit about master trying to reset OFFSET before reassigning. It will help if a regionserver has set it OPENING in the meantime. How do you propsose to handle the case where we fail setting it to OFFLINE because RS1 has alre

Re: data loss after killing RS

2011-05-30 Thread Stack
Have you looked at deferred flushing? Its an attribute you set on your table. You then say how often to run sync using 'hbase.regionserver.optionallogflushinterval'. Default is sync every second. St.Ack On Sat, May 28, 2011 at 6:47 AM, Qing Yan wrote: > Well, I realized myself RS flush to HDF

Re: How to improve random read latency?

2011-05-30 Thread Stack
See http://hbase.apache.org/book.html#performance and the notes over in the other thread, "How to improve HBase throughput with YCSB?" St.Ack On Sun, May 29, 2011 at 2:28 PM, Sean Bigdatafun wrote: > For pure random read, I do not think there exists a good way to improve > latency. Essentially, e

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

2011-05-30 Thread Stack
Odd. I dont' see the regionserver checking into the master (maybe thats the way it is in pseudo-distributed and I just forgot). Can you paste more master log? I don't see the regionserver coming in in the snippet you've pasted so not sure how its registering itself (I see the timeout when we tr

Re: Is there any way to disable WAL while keeping data safety

2011-05-30 Thread Ted Yu
Xiyun: Take a look at https://issues.apache.org/jira/browse/HBASE-3871 for parallel HFile splitting. On Mon, May 30, 2011 at 6:31 PM, Gan, Xiyun wrote: > I used BulkLoad to import data. The step of writing HFiles using m/r is > fast, but the step of loading HFiles to hbase takes lots of time. It

Re: Is there any way to disable WAL while keeping data safety

2011-05-30 Thread Gan, Xiyun
Thanks a lot Is there any suggestion on the Region is not online Exception? On Tue, May 31, 2011 at 9:36 AM, Joey Echeverria wrote: > If you have a well defined key space, you'll get better performance if > you pre-split your table and use the TotalOrderPartitioner with your > MapReduce job. >

Re: Is there any way to disable WAL while keeping data safety

2011-05-30 Thread Joey Echeverria
If you have a well defined key space, you'll get better performance if you pre-split your table and use the TotalOrderPartitioner with your MapReduce job. You can see an example of pre-splitting here: http://hbase.apache.org/book.html#precreate.regions. -Joey On Mon, May 30, 2011 at 9:31 PM, Gan

Re: Is there any way to disable WAL while keeping data safety

2011-05-30 Thread Gan, Xiyun
I used BulkLoad to import data. The step of writing HFiles using m/r is fast, but the step of loading HFiles to hbase takes lots of time. It says HFile at ** no longer fits inside a single region. Splitting Even worth, sometimes it throws Region is not online Exception. Thanks On Fri, Ma

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Andrew Purtell
The hypervisor steals a lot of CPU time from m1.large instances. You should be using c1.xlarge instances. Are you using local storage or EBS? Be aware that I/O performance on EC2 for any system is lower than if you are using real hardware, significantly so if not using one of the instance type

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread lohit
Hello Harold, Can you share with us what kind of throughput you are seeing. Number of ops/sec and read latency you are seeing. Also, what version of hbase are you running. Thanks, Lohit 2011/5/30 Harold Lim > Hi All, > > I have an HBase cluster on ec2 m1.large instance (10 region servers). I'm

Re: Problem starting up HBase in pseudo distributed mode

2011-05-30 Thread Sean Bigdatafun
Hi Hari, I am experiencing the same problem as you do, I think. (My system is also Ubuntu 11.04). Please take a look at my thread and see if it is the same problem you are experiencing.topic "0.90.1 HMaster malfunction in pseudo-distributed mode". Hopefully this question get answered after the

Re: How to improve HBase throughput with YCSB?

2011-05-30 Thread Ted Dunning
What kind of operations? On Mon, May 30, 2011 at 9:43 AM, Harold Lim wrote: > Hi All, > > I have an HBase cluster on ec2 m1.large instance (10 region servers). I'm > trying to run a read-only YCSB workload. It seems that I can't get a good > throughput. It saturates to around 600+ operations per

How to improve HBase throughput with YCSB?

2011-05-30 Thread Harold Lim
Hi All, I have an HBase cluster on ec2 m1.large instance (10 region servers). I'm trying to run a read-only YCSB workload. It seems that I can't get a good throughput. It saturates to around 600+ operations per second. My dataset is around 200GB (~1k+ regions). Running major compaction and als

Problem starting up HBase in pseudo distributed mode

2011-05-30 Thread Hari Sreekumar
Hi, I am trying to set up hbase is pseudo distributed mode on one of the machines and I am getting this error when I try to use hbase shell. If I try the list command, it just hangs. hbase(main):002:0> create 't1', 'f1' ERROR: org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: org.apache.