I'll try to put together a unit test and report back. thanks, liam
On Mon, Jun 30, 2014 at 3:25 PM, Ted Yu <[email protected]> wrote: > FuzzyRowFilter is an interesting filter around which there has been user > feedback on various scenarios. > > If you can write a unit test which exhibits the problem in your first > point, that would help us track down the root cause. > > I checked FuzzyRowFilter in 0.94 branch - last fix for FuzzyRowFilter > was HBASE-7628 > which you already have in 0.94.15 > > Cheers > > > On Mon, Jun 30, 2014 at 2:59 PM, Liam Slusser <[email protected]> wrote: > > > Hey Hbase list, > > > > First question - It seems that the first time I do a scan with a few > > filters the system returns nothing - it also takes a long time (20-30 > > seconds) - but I can run the exact same request over again and it goes > much > > quicker (2-3 seconds for a total scan, I figured things are cached the > > second time which is fine) but the 2nd time around I get results. It is > > the exact same scan request. I don't get any errors and nothing in the > log > > files... > > > > Has anybody else noticed anything like this? I'm running HBase > > 0.94.15-cdh4.6.0 and using FuzzyRowFilter with SingleColumnValueFilter on > > top of my scan. > > > > Second question - how big is too big? I am using my hbase database to > > store parsed logs, currently I am breaking the logs into monthly tables. > I > > am inputting around 350 million logs a day so near the end of the month > > there is an estimated 8-10 billion rows per table. All seems to be > fine, I > > am able to use FuzzyRowFilter+SingleColumnValueFilter and scan over an > hour > > of logs in about 10 seconds so the performance is still very decent. Is > > there any advantage to breaking the table up into separate days? Is > there > > a best practices guide for tables this big? > > > > thanks! > > liam > > >
