Anoop, Ramkrishna Thank you for explanation! I've got it.
On Mon, Jan 21, 2013 at 12:59 PM, Anoop Sam John <anoo...@huawei.com> wrote: > > I suppose if scanning process has started at once on > all regions, then I would find in log files at least one value per region, > but I have found one value per region only for those regions, that resides > before the particular one. > > @Eugeny - FuzzyFilter like any other filter works at the server side. The > scanning from client side will be like sequential starting from the 1st > region (Region with empty startkey or the corresponding region which > contains the startkey whatever you mentioned in your scan). From client, > request will go to RS for scanning a region. Once that region is over the > next region will be contacted for scan(from client) and so on. There is no > parallel scanning of multiple regions from client side. [This is when > using a HTable scan APIs] > > When MR used for scanning, we will be doing parallel scans from all the > regions. Here will be having mappers per region. But the normal scan from > client side will be sequential on the regions not parallel. > > -Anoop- > ________________________________________ > From: Eugeny Morozov [emoro...@griddynamics.com] > Sent: Monday, January 21, 2013 1:46 PM > To: user@hbase.apache.org > Cc: Alex Baranau > Subject: Re: Custom Filter and SEEK_NEXT_USING_HINT issue > > Finally, the mystery has been solved. > > Small remark before I explain everything. > > The situation with only region is absolutely the same: > Fzzy: AAAA1Q7iQ9JA > Next fzzy: F7dtxwqVQ_Pw <-- the value I'm trying to find. > Fzzy: F7dt8QWPSIDw > Somehow FuzzyRowFilter has just omit my value here. > > > So, the explanation. > In javadoc for FuzzyRowFilter question mark is used as substitution for > unknown value. Of course it's possible to use anything including zero > instead of question mark. > For quite some time we used literals to encode our keys. Literals like > you've seen already: AAAA1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form > of just 8 bytes, which requires 1.5 times more space. So we've decided to > store raw version - just byte[8]. But unfortunately the symbol '?' is > exactly in the middle of the byte (according to ascii table > http://www.asciitable.com/), which means with FuzzyRowFilter we skip half > of values in some cases. In the same time question mark is exactly before > any letter that could be used in key. > > Despite the fact we have integration tests - that's just a coincidence we > haven't such an example in there. > > So, as an advice - always use zero instead of question mark for > FuzzyRowFilter. > > Thank's to everyone! > > P.S. But the question with region scanning order is still here. I do not > understand why with FuzzyFilter it goes from one region to another until it > stops at the value. I suppose if scanning process has started at once on > all regions, then I would find in log files at least one value per region, > but I have found one value per region only for those regions, that resides > before the particular one. > > > On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel <michael_se...@hotmail.com > >wrote: > > > If its the same class and its not a patch, then the first class loaded > > wins. > > > > So if you have a Class Foo and HBase has a Class Foo, your code will > never > > see the light of day. > > > > Perhaps I'm stating the obvious but its something to think about when > > working w Hadoop. > > > > On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <emoro...@griddynamics.com> > > wrote: > > > > > Ted, > > > > > > that is correct. > > > HBase 0.92.x and we use part of the patch 6509. > > > > > > I use the filter as a custom filter, it lives in separate jar file and > > goes > > > to HBase's classpath. I did not patch HBase. > > > Moreover I do not use protobuf's descriptions that comes with the > filter > > in > > > patch. Only two classes I have - FuzzyRowFilter itself and its test > > class. > > > > > > And it works perfectly on small dataset like 100 rows (1 region). But > > when > > > my dataset is more than 10mln (260 regions), it somehow loosing rows. > I'm > > > not sure, but it seems to me it is not fault of the filter. > > > > > > > > > On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > >> To my knowledge CDH-4.1.2 is based on HBase 0.92.x > > >> > > >> Looks like you were using patch from HBASE-6509 which was integrated > to > > >> trunk only. > > >> Please confirm. > > >> > > >> Copying Alex who wrote the patch. > > >> > > >> Cheers > > >> > > >> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov > > >> <emoro...@griddynamics.com>wrote: > > >> > > >>> Hi, folks! > > >>> > > >>> HBase, Hadoop, etc version is CDH-4.1.2 > > >>> > > >>> I'm using custom FuzzyRowFilter, which I get from > > >>> > > >>> > > >> > > > http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and > > >>> suddenly after quite a time we found that it starts loosing data. > > >>> > > >>> Basically the idea of FuzzyRowFilter is that it tries to find key > that > > >> has > > >>> been provided and if there is no such a key - but more exists in > table > > - > > >> it > > >>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds > > >> required > > >>> key. As I understand, HBase in this key will fast-forward to required > > >> key - > > >>> it must be similar or same as to get Scan with setStartRow. > > >>> > > >>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm > > able > > >>> to get it using Scan.setStartRow. > > >>> For FuzzyFilter I'm using empty Scan - I didn't specify start row, > stop > > >> row > > >>> or anything related. > > >>> That's what happening: > > >>> > > >>> Fzzy: AAAA1Q7iQ9JA > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: AQAAnA96rxTg > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: AgAADQWPSIDw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: AwAA-Q33Zb9Q > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: BAAAOg8oyu7A > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: BQAA9gqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: BgABZQ7iQ9JA > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: BwAAbgrpAojg > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: CAAAUQWPSIDw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: CQABVgqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: CgAAOQ7iQ9JA > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: CwAALwqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: DAAAMwWPSIDw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: DQAADgjqzsIQ > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: DgAAOgCcWv9g > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: DwAAKg7iQ9JA > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: EAAAugqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: EQAAJAqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: EgAABgIOMBgg > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: EwAAEwqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: FAAACQqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: FQAAIAqVQrTw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: FgAAeAWPSIDw > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: FwAAAw33Zb9Q > > >>> Next fzzy: F7dtxwqVQ_Pw > > >>> Fzzy: F7dt8QWPSIDw > > >>> > > >>> It's obvious that my FuzzyRowFilter knows what to search and every > time > > >> it > > >>> repeats its question. > > >>> The very first key - I suppose is just the first key of a region > where > > my > > >>> key is located. > > >>> The very last key - is the key that is already bigger than what I'm > > >> trying > > >>> to find - that's the reason why FuzzyFilter stopped there. > > >>> > > >>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but > > >>> unsuccessfully. > > >>> Do you have any idea how to explain these many trials? > > >>> > > >>> Thanks in advance. > > >>> -- > > >>> Evgeny Morozov > > >>> Developer Grid Dynamics > > >>> Skype: morozov.evgeny > > >>> www.griddynamics.com > > >>> emoro...@griddynamics.com > > >>> > > >> > > > > > > > > > > > > -- > > > Evgeny Morozov > > > Developer Grid Dynamics > > > Skype: morozov.evgeny > > > www.griddynamics.com > > > emoro...@griddynamics.com > > > > > > > -- > Evgeny Morozov > Developer Grid Dynamics > Skype: morozov.evgeny > www.griddynamics.com > emoro...@griddynamics.com > -- Evgeny Morozov Developer Grid Dynamics Skype: morozov.evgeny www.griddynamics.com emoro...@griddynamics.com