I have a table with hundred of million records. This table contains data
about servers and events genereated on them. Following is the row key of
the table:

rowkey = md5(serverId) + timestamp  [32 hex characters + 10 digits = 42
characters]

One of the use case is to list all the events from time t1 to t2. For this,
normal scan is taking too much time. To speed up the things, I have done
the following:
1. Fetch the list of unique serverId from another table (real fast).
2. Divide the above list in 256 buckets based on first two hex characters
of md5 of serverIds.
3. For each bucket, call a co-processor with list of serverId, start time
and end time.

The co-processor scans the table as follow:
   for (String serverId :  serverIds) {
       byte[] startKey = generateKeyserverId, startTime);
       byte[] endKey = generateKey(serverId, endTime);
       Scan scan = new Scan(startKey, endKey);
       InternalScanner scanner = env.getRegion().getScanner(scan);
       ....
   }

The results are coming fast with the above approach.

My ONLY concern is the large number of scans. If the table has 20,000
serverIds then the above code is making 20,000 scans. Will it impact the
overall performance and scalability of HBase?

Thanks,
Ravi Singal

Reply via email to