On Tue, Mar 8, 2011 at 11:04 AM, Chris Tarnas <c...@email.com> wrote:
> Just as a point of reference, in one of our systems we have 500+million rows 
> that have a cell in its own column family that is about usually about 
> 100bytes, but in about 10,000 of rows the cell can get to 300mb (average is 
> probably about 30mb for the larger data). The jumbo sized data gets loaded in 
> separately from the smaller data, although it all goes through the same 
> pipeline. We are using cdh3b45 (0.90.1) GZ compression, region size of 1GB 
> and with a max value size of 500mb. So far we have had no problems with the 
> larger values.
>
> Our largest problem was performance related to inserting into several column 
> families for the small sized value loads and pauses when flushing the 
> memstores. 0.90.1 helped quite a bit with that.

Flushing is done without blocking, were the pauses you were seeing
related to the "too many stores" issue or about the global memstore
size?

In general inserting into many families is a bad idea unless the sizes
are the same. The worst case is inserting a few kbs in one and a few
mbs in the other. The reason being:
https://issues.apache.org/jira/browse/HBASE-3149

J-D

Reply via email to