Oh, sorry. You're right. You already said that and I forgot to update it. It's working fine when I add this parameter. And as you are saying, I can get the respons time I want by playing with the chance...
I get (34758 lines/seconds) with 0.99 as the chance, and only (7564 lines/seconds) with 0.09... But that's still better than the gets. I just retried the gets, to see if the performances are changing after many table access, but results are still almost the same. I also tried to read 100 000 rows in a row with a random start key, and the performances are close to the random filter. (35273 lines/seconds). So it's really the get which is giving me an headache... 2012/6/28, N Keywal <nkey...@gmail.com>: > For the filter list my guess is that you're filtering out all rows > because RandomRowFilter#chance is not initialized (it should be > something like RandomRowFilter rrf = new RandomRowFilter(0.5);) > But note that this test will never be comparable to the test with a > list of gets. You can make it as slow/fast as you want by playing with > the 'chance' parameter. > > The results with gets and bloom filter are also in the interesting > category, hopefully an expert will get in the loop... > > > > On Thu, Jun 28, 2012 at 6:04 PM, Jean-Marc Spaggiari > <jean-m...@spaggiari.org> wrote: >> Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I >> mean, bad I did not figured that. Thanks for pointing that. That >> definitively explain the difference in the performances. >> >> I have activated the bloomfilters with this code: >> HBaseAdmin admin = new HBaseAdmin(config); >> HTable table = new HTable(config, "test3"); >> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]); >> HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0]; >> cd.setBloomFilterType(BloomType.ROW); >> admin.disableTable("test3"); >> admin.modifyColumn("test3", cd); >> admin.enableTable("test3"); >> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]); >> >> And here is the result for the first attempt (using gets): >> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', >> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', >> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => >> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => >> 'true', BLOCKCACHE => 'true'} >> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', >> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', >> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => >> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => >> 'true', BLOCKCACHE => 'true'} >> Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0... >> Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds) >> >> 2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds) >> 3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds) >> After few more iterations (about 30), I'm between 200 and 250 >> lines/seconds, like before. >> >> Regarding the filterList, I tried, but now I'm getting this error from >> the servers: >> org.apache.hadoop.hbase.regionserver.LeaseException: >> org.apache.hadoop.hbase.regionserver.LeaseException: lease >> '-6376193724680783311' does not exist >> Here is the code: >> final int linesToRead = 10000; >> System.out.println(new java.util.Date () + " Processing iteration " >> + >> iteration + "... "); >> RandomRowFilter rrf = new RandomRowFilter(); >> KeyOnlyFilter kof = new KeyOnlyFilter(); >> Scan scan = new Scan(); >> List<Filter> filters = new ArrayList<Filter>(); >> filters.add(rrf); >> filters.add(kof); >> FilterList filterList = new FilterList(filters); >> scan.setFilter(filterList); >> scan.setBatch(Math.min(linesToRead, 1000)); >> scan.setCaching(Math.min(linesToRead, 1000)); >> ResultScanner scanner = table.getScanner(scan); >> processed = 0; >> long timeBefore = System.currentTimeMillis(); >> for (Result result : scanner.next(linesToRead)) >> { >> System.out.println("Result: " + result); // >> if (result != null) >> processed++; >> } >> scanner.close(); >> >> It's failing when I try to do for (Result result : >> scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and >> 1 with the same result :( >> >> I will try to find the root cause, but if you have any hint, it's >> welcome. >> >> JM >