Hello! It is known and I saw it in the code that time range set by scan.setTimeRange is used to filter out HFiles for further scan. Which means that speed of following scanner.next must be almost zero in case if I set time range far away in future. I am sure that I do not have HFiles that fall into the set time range period.
But - and here is the question - surprisingly scanning with set time range is far longer than without it. My results are following: Use range [false]. Time spent (avg): [0] Use range [true]. Time spent (avg): [525] There are KeyValues listed, when time range is not used. The code is following: public static void run(boolean useRange, HTable table) throws Exception { Scan scan = new Scan().addFamily( family ).setCaching( -1 ).setCacheBlocks( false ); scan.setStartRow( random start row ); if (useRange) scan.setTimeRange(1348114401600L, 1348114401700L); ResultScanner scanner = table.getScanner(scan); for(int i = 0 ; i < N; i++) { // There were bunch of measures, where N was from 10 to 50 long time = System.currentTimeMillis(); result = scanner.next(); sum += (System.currentTimeMillis() - time) / N; } } Of course such a measurements are include all sort of noise like network overhead, etc, but I'm using virtual machine on my own box, and at the time I do measurement there is no other activity neither on my own box or this virtual machine, so such a noise is minimum. Also I've used YourKit to measure tracing and sampling of running HRegionServer, but didn't found anything suspicious. Though I didn't look at heap and GC perf. Tracing is in attach. So, the question is why is it so slow when time range is set and so fast without it? -- Evgeny Morozov Developer Grid Dynamics Skype: morozov.evgeny www.griddynamics.com emoro...@griddynamics.com