Re: Slow Inserts on EC2 Cluster

2010-09-03 Thread Matthew LeMieux
Thank you for the pointer. I'm not sure if this is the bug I was encountering. This particular bug points to a problem with how load was calculated. The problem I was experiencing seemed to be a real issue that affected performance, not just reporting. They published a fix on 20100827, but

Re: Slow Inserts on EC2 Cluster

2010-09-02 Thread Bradford Stephens
Ah, that explains a lot. Thanks for the tips JGray! I shall do that ASAP. On Thu, Sep 2, 2010 at 12:10 PM, Andrew Purtell wrote: >> From: Bradford Stephens >> A small improvement, but nowhere near what I'm used to, >> even from vague memories of old clusters on EC2. > > Those days are gone. > >

Re: Slow Inserts on EC2 Cluster

2010-09-02 Thread Andrew Purtell
> From: Bradford Stephens > A small improvement, but nowhere near what I'm used to, > even from vague memories of old clusters on EC2. Those days are gone. Used to be m1.small provided reasonable performance for some apps. Now comment to the effect that the platform is simply too oversubscribed

RE: Slow Inserts on EC2 Cluster

2010-09-01 Thread Jonathan Gray
to:bradfordsteph...@gmail.com] > Sent: Wednesday, September 01, 2010 6:58 PM > To: user@hbase.apache.org > Subject: Re: Slow Inserts on EC2 Cluster > > On the full data set (10 reducers), speeds are about 100k/minute (WAL > Disabled). Still much slower than I'd like, but I'll tak

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Bradford Stephens
On the full data set (10 reducers), speeds are about 100k/minute (WAL Disabled). Still much slower than I'd like, but I'll take it over the former :) On Wed, Sep 1, 2010 at 5:59 PM, Ryan Rawson wrote: > Yes exactly, column families have the same performance profile as > tables.  12 CF = 12 tables

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Ryan Rawson
Yes exactly, column families have the same performance profile as tables. 12 CF = 12 tables. -ryan On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens wrote: > Good call JD!  We've gone from 20k inserts/minute to 200k. Much > better! I still think it's slower than I'd want by about one OOM, but >

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Bradford Stephens
Good call JD! We've gone from 20k inserts/minute to 200k. Much better! I still think it's slower than I'd want by about one OOM, but it's progress. Since we're populating 12 families, I guess we're seeking for 12 files on each write. Not pretty. I'll look at the customer and see if they really ha

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Ryan Rawson
There are a couple of things here happening, and some solutions: - dont flush based on region size, only on family/store size. - do what the bigtable paper says and merge the smallest file with memstore while flushing thus keeping the net number of files low. The latter would probably benefit fro

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Bradford Stephens
Yeah, those families are all needed -- but I didn't realize the files were so small. That's odd -- and you're right, that'd certainly throw it off. I'll merge them all and see if that helps. On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans wrote: > Took a quick look at your RS log, it looks lik

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Jean-Daniel Cryans
Took a quick look at your RS log, it looks like you are using a lot of families and loading them pretty much at the same rate. Look at lines that start with: INFO org.apache.hadoop.hbase.regionserver.Store: Added ... And you will see that you are dumping very small files on the filesystem, on ave

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Bradford Stephens
'allo, I changed the cluster form m1.large to c1.xlarge -- we're getting about 4k inserts /node / minute instead of 2k. A small improvement, but nowhere near what I'm used to, even from vague memories of old clusters on EC2. I also stripped all the Cascading from my code and have a very basic raw

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Andrew Purtell
> From: Gary Helmling > > If you're using AMIs based on the latest Ubuntu (10.4), > theres a known kernel issue that seems to be causing > high loads while idle.  More info here: > > https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 Seems best to avoid using Lucid on EC2 for now, th

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Andrew Purtell
> From: Matthew LeMieux > I'm starting to find that EC2 is not reliable enough to support > HBase. [...] > (I've been using m1.large and m2.xlarge running CDH3) I personally don't use EC2 for anything more than on demand ad hoc testing, but I do know of successful deployments there. However, I

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Gary Helmling
On Wed, Sep 1, 2010 at 7:24 AM, Matthew LeMieux wrote: > I'm starting to find that EC2 is not reliable enough to support HBase. I'm > running into 2 things that might be related: > > 1) On idle machines that are apparently doing nothing (reports of <3% CPU > utilization, no I/O wait) the load i

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Bradford Stephens
n. > > JG > >> -Original Message- >> From: Matthew LeMieux [mailto:m...@mlogiciels.com] >> Sent: Wednesday, September 01, 2010 7:25 AM >> To: user@hbase.apache.org >> Subject: Re: Slow Inserts on EC2 Cluster >> >> I'm starting to find tha

RE: Slow Inserts on EC2 Cluster

2010-09-01 Thread Jonathan Gray
IO so that it cannot write to its transaction log and that is what is slowing it down. JG > -Original Message- > From: Matthew LeMieux [mailto:m...@mlogiciels.com] > Sent: Wednesday, September 01, 2010 7:25 AM > To: user@hbase.apache.org > Subject: Re: Slow Inserts on EC2

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Bradford Stephens
Wow, thanks. I didn't consider that ... I try to avoid the cloud if at all possible :) Cheers, B On Wed, Sep 1, 2010 at 4:14 AM, Andrew Purtell wrote: >> From: Bradford Stephens >> I'm banging my head against some perf issues on EC2. I'm >> using .20.6 on ASF hadoop .20.2, and tweaked the ec2 hb

Re: Slow Inserts on EC2 Cluster

2010-09-01 Thread Andrew Purtell
> From: Bradford Stephens > I'm banging my head against some perf issues on EC2. I'm > using .20.6 on ASF hadoop .20.2, and tweaked the ec2 hbase > scripts to handle the new version. > > I'm trying to insert about 22G of data across nodes on EC2 > m1.large instances [...] c1.xlarge provides (bare

Slow Inserts on EC2 Cluster

2010-09-01 Thread Bradford Stephens
Hey guys, I'm banging my head against some perf issues on EC2. I'm using .20.6 on ASF hadoop .20.2, and tweaked the ec2 hbase scripts to handle the new version. I'm trying to insert about 22G of data across nodes on EC2 m1.large instances. I'm getting speeds of about 1200 rows/minute. It seems li