Attached is our memstore size graph...not sure it will make it to the post.
Ours it definitely not as gracefull as yours. You can see where we restarted
last 16 hours ago. We have not had any issues since, but we usually don't
have problems until 24-48 hours into loads. Stack yes the 65% seams to help
again but it has not been long enough.

I think our problem is the load pattern. Since we use a very controlled q
based method to do work our Python code is relentless in terms of keeping
the pressure up. In our testing we will Q up 500k messages with 10k writes
per message that all get written to 3 column families (primary plus 2
secondary indexes). This means there will be 15 billion writes waiting to go
into hbase. It will take us days to load this and the JVM will eventually
crumble. If we are lucky enough to avoid too long of a GC or along with it
we also see OOM problems crop up eventually.

Again our hardware is taking it easy during this process except for the JVM
and its 8g heap. The heat should be on the disk and it is not, as we are not
really pushing it at all. I could and would like to throttle it up and have
6 writers per node instead of 4 but I know the nodes can not sustain that.
What we need to find is what level of writes can our cluster sustain for
weeks at a time and still be fast with reads and not go AWOL. Once we find
that sweat spot then we can try to turn up the heat...but we never seem to
find it. Between GC pauses and OOMs we have never run under load for long
enough to gain "confidence".

Thanks.

On Thu, May 26, 2011 at 3:43 AM, Jack Levin <magn...@gmail.com> wrote:

> Wayne, I think you are hitting fragmentation, how often do you flush?
>  Can you share memstore flush graphs?
> Here is ours:
> http://img851.yfrog.com/img851/9814/screenshot20110526at124.png
>
> We run at 12G Heap, 20% memstore size, 50% blockcache, have recently
> added incremental mode to combat too frequent ParNew GCs:
>
> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms12000m
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError
> -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
> -XX:+CMSIncrementalMode \
> -XX:+CMSIncrementalPacing \
> -XX:-TraceClassUnloading
> "
>
> -Jack
>
> On Wed, May 25, 2011 at 5:55 PM, Wayne <wav...@gmail.com> wrote:
> > We are using std thrift from python. All writes are batched into usually
> 30k
> > writes per batch. The writes are small double/varchar(100) type values.
> Our
> > current write performance is fine for our needs...our concern is that
> they
> > are not sustainable over time given the GC timeouts.
> >
> > Per the 4 items above, yes the staff is biased (obviously), the OS is
> what
> > 2/3+ linux servers are running, the machines are Super Micro twins with 6
> > sata 1TB RE4 disks (5 for hdfs) and 24G of ECC memory (hardly
> non-standard).
> > We are plain vanilla. Our workload may be non-standard in that it is NOT
> > map-reduce. It is an evenly distributed q based home spun parallel
> > processing framework.
> >
> > We are not a Java shop, and do not want to become one. I think to push
> the
> > limits and do well with hadoop+hdfs you have to buy into Java and have
> deep
> > skills there. We are database experts looking only for an open source
> > scalable database. We are a python shop and have no interest in digging
> deep
> > into java and the jvm.
> >
> > Thanks.
> >
> > On Wed, May 25, 2011 at 8:07 PM, Ted Dunning <tdunn...@maprtech.com>
> wrote:
> >
> >> How large are these writes?
> >>
> >> Are you using asynchbase or other alternative client implementation?
> >>
> >> Are you batching updates?
> >>
> >> On Wed, May 25, 2011 at 2:44 PM, Wayne <wav...@gmail.com> wrote:
> >>
> >> > What are your write levels? We are pushing 30-40k writes/sec/node on
> 10
> >> > nodes for 24-36-48-72 hours straight. We have only 4 writers per node
> so
> >> we
> >> > are hardly overwhelming the nodes. Disk utilization runs at 10-20%,
> load
> >> is
> >> > max 50% including some app code, and memory is the 8g JVM out of 24G.
> We
> >> > run
> >> > our production as 20TB in MySQL and see 90% disk utilization for hours
> >> > every
> >> > day. A database must be able to accept being pounded 24/7. If the
> >> hardware
> >> > can handle it so should the database...this is not true for Java based
> >> > databases. The reality is that Java is not nearly ready for what a
> real
> >> > database will expect of it. Sure we could "back off" the volume and
> add
> >> 100
> >> > more nodes like Cassandra requires but then we might as well have used
> >> > something else given that hardware spend.
> >> >
> >> > Our problem is that we have invested so much time with Hbase that it
> is
> >> > hard
> >> > for us to walk away and go to the sharded PostgreSQL we should have
> used
> >> 9
> >> > months back. Sorry for the negativity but considering giving up after
> >> > having
> >> > invested all of this time is painful.
> >> >
> >> >
> >> > On Wed, May 25, 2011 at 4:21 PM, Erik Onnen <eon...@gmail.com> wrote:
> >> >
> >> > > On Wed, May 25, 2011 at 11:39 AM, Ted Dunning <
> tdunn...@maprtech.com>
> >> > > wrote:
> >> > > > It should be recognized that your experiences are a bit out of the
> >> norm
> >> > > > here.  Many hbase installations use more recent JVM's without
> >> problems.
> >> > >
> >> > > Indeed, we run u25 on CentOS 5.6 and over several days uptime it's
> >> > > common to never see a full GC across an 8GB heap.
> >> > >
> >> > > What we never see is a ParNew taking .1 seconds, they're usually .01
> >> > > and we never have full collections lasting 92 seconds. The only time
> >> > > I've ever seen a JVM at 8GB take that long is when running on puny
> >> > > (read virtualized) cores or when there are Ubuntu kernel bugs at
> play.
> >> > > The same is true for our Cassandra deploys as well.
> >> > >
> >> >
> >>
> >
>

Reply via email to