FuzzyRowFilter is an interesting filter around which there has been user feedback on various scenarios.
If you can write a unit test which exhibits the problem in your first point, that would help us track down the root cause. I checked FuzzyRowFilter in 0.94 branch - last fix for FuzzyRowFilter was HBASE-7628 which you already have in 0.94.15 Cheers On Mon, Jun 30, 2014 at 2:59 PM, Liam Slusser <[email protected]> wrote: > Hey Hbase list, > > First question - It seems that the first time I do a scan with a few > filters the system returns nothing - it also takes a long time (20-30 > seconds) - but I can run the exact same request over again and it goes much > quicker (2-3 seconds for a total scan, I figured things are cached the > second time which is fine) but the 2nd time around I get results. It is > the exact same scan request. I don't get any errors and nothing in the log > files... > > Has anybody else noticed anything like this? I'm running HBase > 0.94.15-cdh4.6.0 and using FuzzyRowFilter with SingleColumnValueFilter on > top of my scan. > > Second question - how big is too big? I am using my hbase database to > store parsed logs, currently I am breaking the logs into monthly tables. I > am inputting around 350 million logs a day so near the end of the month > there is an estimated 8-10 billion rows per table. All seems to be fine, I > am able to use FuzzyRowFilter+SingleColumnValueFilter and scan over an hour > of logs in about 10 seconds so the performance is still very decent. Is > there any advantage to breaking the table up into separate days? Is there > a best practices guide for tables this big? > > thanks! > liam >
