Re: TIMERANGE performance on uniformly distributed keyspace

Wouter Bolsterlee Wed, 25 Apr 2012 13:48:59 -0700

Hi,

2012-04-14 klockan 21:07 skrev Rob Verkuylen:
> As far as I understand sequential keys with a timerange scan have the best
> read performance possible, because of the HFile metadata, just as N
> indicates. Maybe adding Bloomfilters can further up the performance.


As far I understand it, Bloom filters are only useful for lookups based on
row key (and possibly column name), not for any time related lookups.

> Still, in my case with random keys I get quick(sub second) response from my
> scan example earlier. Does HBase keep all the HFile metadata in memory? I
> can't imagine it will start hitting hundreds, potentially thousands of
> HFiles, reading their metadata, start full scanning the files and returning
> rows. Does it?

What does "quick response" mean here? Is it the response time for the first
batch of results? This can be quite low if the scan finds rows that match
your scan criteria in a region/HFile at the start of the scanned range (e.g.
at the beginning of the table).

Did you also measure the time for the complete scan to complete (and the
load it causes on your cluster), and relate it to the performance of a
sequential scan over a secondary index table with monotonically increasing
keys (and the load that causes on your cluster since the index has to be
maintained and written to a single region server)?

    — Wouter

signature.asc
Description: Digital signature

Re: TIMERANGE performance on uniformly distributed keyspace

Reply via email to