Anoop, Ramkrishna

Thank you for explanation! I've got it.


On Mon, Jan 21, 2013 at 12:59 PM, Anoop Sam John <anoo...@huawei.com> wrote:

> > I suppose if scanning process has started at once on
> all regions, then I would find in log files at least one value per region,
> but I have found one value per region only for those regions, that resides
> before the particular one.
>
> @Eugeny -  FuzzyFilter like any other filter works at the server side. The
> scanning from client side will be like sequential starting from the 1st
> region (Region with empty startkey or the corresponding region which
> contains the startkey whatever you mentioned in your scan). From client,
> request will go to RS for scanning a region. Once that region is over the
> next region will be contacted for scan(from client) and so on.  There is no
> parallel scanning of multiple regions from client side.  [This is when
> using a HTable scan APIs]
>
> When MR used for scanning, we will be doing parallel scans from all the
> regions. Here will be having mappers per region.  But the normal scan from
> client side will be sequential on the regions not parallel.
>
> -Anoop-
> ________________________________________
> From: Eugeny Morozov [emoro...@griddynamics.com]
> Sent: Monday, January 21, 2013 1:46 PM
> To: user@hbase.apache.org
> Cc: Alex Baranau
> Subject: Re: Custom Filter and SEEK_NEXT_USING_HINT issue
>
> Finally, the mystery has been solved.
>
> Small remark before I explain everything.
>
> The situation with only region is absolutely the same:
> Fzzy: AAAA1Q7iQ9JA
> Next fzzy: F7dtxwqVQ_Pw  <-- the value I'm trying to find.
> Fzzy: F7dt8QWPSIDw
> Somehow FuzzyRowFilter has just omit my value here.
>
>
> So, the explanation.
> In javadoc for FuzzyRowFilter question mark is used as substitution for
> unknown value. Of course it's possible to use anything including zero
> instead of question mark.
> For quite some time we used literals to encode our keys. Literals like
> you've seen already: AAAA1Q7iQ9JA or F7dt8QWPSIDw. But that's Base64 form
> of just 8 bytes, which requires 1.5 times more space. So we've decided to
> store raw version - just  byte[8]. But unfortunately the symbol '?' is
> exactly in the middle of the byte (according to ascii table
> http://www.asciitable.com/), which means with FuzzyRowFilter we skip half
> of values in some cases. In the same time question mark is exactly before
> any letter that could be used in key.
>
> Despite the fact we have integration tests - that's just a coincidence we
> haven't such an example in there.
>
> So, as an advice - always use zero instead of question mark for
> FuzzyRowFilter.
>
> Thank's to everyone!
>
> P.S. But the question with region scanning order is still here. I do not
> understand why with FuzzyFilter it goes from one region to another until it
> stops at the value. I suppose if scanning process has started at once on
> all regions, then I would find in log files at least one value per region,
> but I have found one value per region only for those regions, that resides
> before the particular one.
>
>
> On Mon, Jan 21, 2013 at 4:22 AM, Michael Segel <michael_se...@hotmail.com
> >wrote:
>
> > If its the same class and its not a patch, then the first class loaded
> > wins.
> >
> > So if you have a Class Foo and HBase has a Class Foo, your code will
> never
> > see the light of day.
> >
> > Perhaps I'm stating the obvious but its something to think about when
> > working w Hadoop.
> >
> > On Jan 19, 2013, at 3:36 AM, Eugeny Morozov <emoro...@griddynamics.com>
> > wrote:
> >
> > > Ted,
> > >
> > > that is correct.
> > > HBase 0.92.x and we use part of the patch 6509.
> > >
> > > I use the filter as a custom filter, it lives in separate jar file and
> > goes
> > > to HBase's classpath. I did not patch HBase.
> > > Moreover I do not use protobuf's descriptions that comes with the
> filter
> > in
> > > patch. Only two classes I have - FuzzyRowFilter itself and its test
> > class.
> > >
> > > And it works perfectly on small dataset like 100 rows (1 region). But
> > when
> > > my dataset is more than 10mln (260 regions), it somehow loosing rows.
> I'm
> > > not sure, but it seems to me it is not fault of the filter.
> > >
> > >
> > > On Sat, Jan 19, 2013 at 3:56 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > >> To my knowledge CDH-4.1.2 is based on HBase 0.92.x
> > >>
> > >> Looks like you were using patch from HBASE-6509 which was integrated
> to
> > >> trunk only.
> > >> Please confirm.
> > >>
> > >> Copying Alex who wrote the patch.
> > >>
> > >> Cheers
> > >>
> > >> On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov
> > >> <emoro...@griddynamics.com>wrote:
> > >>
> > >>> Hi, folks!
> > >>>
> > >>> HBase, Hadoop, etc version is CDH-4.1.2
> > >>>
> > >>> I'm using custom FuzzyRowFilter, which I get from
> > >>>
> > >>>
> > >>
> >
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and
> > >>> suddenly after quite a time we found that it starts loosing data.
> > >>>
> > >>> Basically the idea of FuzzyRowFilter is that it tries to find key
> that
> > >> has
> > >>> been provided and if there is no such a key - but more exists in
> table
> > -
> > >> it
> > >>> returns SEEK_NEXT_USING_HINT. And in getNextKeyHint(...) it builds
> > >> required
> > >>> key. As I understand, HBase in this key will fast-forward to required
> > >> key -
> > >>> it must be similar or same as to get Scan with setStartRow.
> > >>>
> > >>> I'm trying to find key F7dt8QWPSIDw, it is definitely in HBase - I'm
> > able
> > >>> to get it using Scan.setStartRow.
> > >>> For FuzzyFilter I'm using empty Scan - I didn't specify start row,
> stop
> > >> row
> > >>> or anything related.
> > >>> That's what happening:
> > >>>
> > >>> Fzzy: AAAA1Q7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: AQAAnA96rxTg
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: AgAADQWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: AwAA-Q33Zb9Q
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BAAAOg8oyu7A
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BQAA9gqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BgABZQ7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: BwAAbgrpAojg
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CAAAUQWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CQABVgqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CgAAOQ7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: CwAALwqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DAAAMwWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DQAADgjqzsIQ
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DgAAOgCcWv9g
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: DwAAKg7iQ9JA
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EAAAugqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EQAAJAqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EgAABgIOMBgg
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: EwAAEwqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FAAACQqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FQAAIAqVQrTw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FgAAeAWPSIDw
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: FwAAAw33Zb9Q
> > >>> Next fzzy: F7dtxwqVQ_Pw
> > >>> Fzzy: F7dt8QWPSIDw
> > >>>
> > >>> It's obvious that my FuzzyRowFilter knows what to search and every
> time
> > >> it
> > >>> repeats its question.
> > >>> The very first key - I suppose is just the first key of a region
> where
> > my
> > >>> key is located.
> > >>> The very last key - is the key that is already bigger than what I'm
> > >> trying
> > >>> to find - that's the reason why FuzzyFilter stopped there.
> > >>>
> > >>> Do you know any issue with SEEK_NEXT_USING_HINT? I've searched, but
> > >>> unsuccessfully.
> > >>> Do you have any idea how to explain these many trials?
> > >>>
> > >>> Thanks in advance.
> > >>> --
> > >>> Evgeny Morozov
> > >>> Developer Grid Dynamics
> > >>> Skype: morozov.evgeny
> > >>> www.griddynamics.com
> > >>> emoro...@griddynamics.com
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Evgeny Morozov
> > > Developer Grid Dynamics
> > > Skype: morozov.evgeny
> > > www.griddynamics.com
> > > emoro...@griddynamics.com
> >
> >
>
>
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emoro...@griddynamics.com
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emoro...@griddynamics.com

Reply via email to