What kinds of speeds are you seeing? On Thu, Mar 3, 2011 at 10:19 PM, Aditya Sharma <adityadsha...@gmail.com>wrote:
> Hi All, > > I am working on benchmarking different data stores to find the best fit for > our use case. I would like to know views and suggestions of the HBase user > and developer community on some of my findings as the results I am getting > are highly variable. > > My HBase Setup has two EC2 Large hosts (each one has 7.5 GB memory, 4 CPU > cores etc), on which both the HBase master and slaves reside. HDFS master > slave and Zookeeper instances are also split between these two hosts. I > have > three tables with one column family each and they have 100 million, 75 > million and 500 million rows respectively. The actual data consists of a > String key and Long, String columns. The usual access patterns is to have > GETs on individual keys and have periodical batch PUTs. > > I ran my benchmark application on HBase for different scenarios to measure > pure GET performance, mixed GET and PUT performance etc. This was actually > without enabling the HTable APIs writeBuffer or any BloomFilters. The > results I got were quite unimpressive, compared to similar benchmarking > done > using MySQL, Cassandra etc. The performance was anywhere from 40% to 100% > worse. So I started using writeBuffers in my code and also enabled > BloomFilters at ROW level. However I started seeing a lot of variance in > the > benchmarking results (though I would not be too sure about correlating this > with Bloomfilters/WriteBuffering). Another fact causing concern was that > the > results were actually worse than earlier results. > > Since we are using EC2 Large instances, it seems unlikely that network or > some other virtualization related resources crunch are affecting our > performance measurement. > > What I would want to know is whether this rings a bell for anyone else > here. > Could I be missing out on some configuration knob which would result in > background compaction or some such process to start at the wrong time which > might be affecting my benchmarks? Any comments or feedback are welcome. > > Thanks, > Aditya >