scan.setTimeRange performance

Eugeny Morozov Fri, 21 Sep 2012 05:20:53 -0700

Hello!

It is known and I saw it in the code that time range set by
scan.setTimeRange is used to filter out HFiles for further scan.
Which means that speed of following scanner.next must be almost zero in
case if I set time range far away in future. I am sure that I do not have
HFiles that fall into the set time range period.


But - and here is the question - surprisingly scanning with set time range
is far longer than without it.

My results are following:
Use range [false]. Time spent (avg): [0]
Use range [true]. Time spent (avg): [525]

There are KeyValues listed, when time range is not used.

The code is following:
    public static void run(boolean useRange, HTable table) throws Exception
{
        Scan scan = new Scan().addFamily( family ).setCaching( -1
).setCacheBlocks( false );
        scan.setStartRow( random start row );
        if (useRange) scan.setTimeRange(1348114401600L, 1348114401700L);

        ResultScanner scanner = table.getScanner(scan);
        for(int i = 0 ; i < N; i++) { // There were bunch of measures,
where N was from 10 to 50
            long time = System.currentTimeMillis();
            result = scanner.next();
            sum += (System.currentTimeMillis() - time) / N;
        }
    }

Of course such a measurements are include all sort of noise like network
overhead, etc, but I'm using virtual machine on my own box, and at the time
I do measurement there is no other activity neither on my own box or this
virtual machine, so such a noise is minimum.

Also I've used YourKit to measure tracing and sampling of running
HRegionServer, but didn't found anything suspicious. Though I didn't look
at heap and GC perf. Tracing is in attach.

So, the question is why is it so slow when time range is set and so fast
without it?
-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emoro...@griddynamics.com

scan.setTimeRange performance

Reply via email to