Re: row filter - binary comparator at certain range

James Taylor Mon, 21 Oct 2013 15:08:39 -0700

We don't truncate the hash, we mod it. Why would you expect that data
wouldn't be evenly distributed? We've not seen this to be the case.




On Mon, Oct 21, 2013 at 1:48 PM, Michael Segel <[email protected]>wrote:

> What do you call hashing the row key?
> Or hashing the row key and then appending the row key to the hash?
> Or hashing the row key, truncating the hash value to some subset and then
> appending the row key to the value?
>
> The problem is that there is specific meaning to the term salt. Re-using
> it here will cause confusion because you're implying something you don't
> mean to imply.
>
> you could say prepend a truncated hash of the key, however… is prepend a
> real word? ;-) (I am sorry, I am not a grammar nazi, nor an English major. )
>
> So even outside of Phoenix, the concept is the same.
> Even with a truncated hash, you will find that over time, all but the tail
> N regions will only be half full.
> This could be both good and bad.
>
> (Where N is your number 8 or 16 allowable hash values.)
>
> You've solved potentially one problem… but still have other issues that
> you need to address.
> I guess the simple answer is to double the region sizes and not care that
> most of your regions will be 1/2 the max size…  but the size you really
> want and 8-16 regions will be up to twice as big.
>
>
>
> On Oct 21, 2013, at 3:26 PM, James Taylor <[email protected]> wrote:
>
> > What do you think it should be called, because
> > "prepending-row-key-with-single-hashed-byte" doesn't have a very good
> ring
> > to it. :-)
> >
> > Agree that getting the row key design right is crucial.
> >
> > The range of "prepending-row-key-with-single-hashed-byte" is declarative
> > when you create your table in Phoenix, so you typically declare an upper
> > bound based on your cluster size (not 255, but maybe 8 or 16). We've run
> > the numbers and it's typically faster, but as with most things, not
> always.
> >
> > HTH,
> > James
> >
> >
> > On Mon, Oct 21, 2013 at 1:05 PM, Michael Segel <
> [email protected]>wrote:
> >
> >> Then its not a SALT. And please don't use the term 'salt' because it has
> >> specific meaning outside to what you want it to mean.  Just like saying
> >> HBase has ACID because you write the entire row as an atomic element.
>  But
> >> I digress….
> >>
> >> Ok so to your point…
> >>
> >> 1 byte == 255 possible values.
> >>
> >> So which will be faster.
> >>
> >> creating a list of the 1 byte truncated hash of each possible timestamp
> in
> >> your range, or doing 255 separate range scans with the start and stop
> range
> >> key set?
> >>
> >> That will give you the results you want, however… I'd go back and have
> >> them possibly rethink the row key if they can … assuming this is the
> base
> >> access pattern.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >>
> >>
> >>
> >>
> >> On Oct 21, 2013, at 11:37 AM, James Taylor <[email protected]>
> wrote:
> >>
> >>> Phoenix restricts salting to a single byte.
> >>> Salting perhaps is misnamed, as the salt byte is a stable hash based on
> >> the
> >>> row key.
> >>> Phoenix's skip scan supports sub-key ranges.
> >>> We've found salting in general to be faster (though there are cases
> where
> >>> it's not), as it ensures better parallelization.
> >>>
> >>> Regards,
> >>> James
> >>>
> >>>
> >>>
> >>> On Mon, Oct 21, 2013 at 9:14 AM, Vladimir Rodionov
> >>> <[email protected]>wrote:
> >>>
> >>>> FuzzyRowFilter does not work on sub-key ranges.
> >>>> Salting is bad for any scan operation, unfortunately. When salt prefix
> >>>> cardinality is small (1-2 bytes),
> >>>> one can try something similar to FuzzyRowFilter but with additional
> >>>> sub-key range support.
> >>>> If salt prefix cardinality is high (> 2 bytes) - do a full scan with
> >> your
> >>>> own Filter (for timestamp ranges).
> >>>>
> >>>> Best regards,
> >>>> Vladimir Rodionov
> >>>> Principal Platform Engineer
> >>>> Carrier IQ, www.carrieriq.com
> >>>> e-mail: [email protected]
> >>>>
> >>>> ________________________________________
> >>>> From: Premal Shah [[email protected]]
> >>>> Sent: Sunday, October 20, 2013 10:42 PM
> >>>> To: user
> >>>> Subject: Re: row filter - binary comparator at certain range
> >>>>
> >>>> Have you looked at FuzzyRowFilter? Seems to me that it might satisfy
> >> your
> >>>> use-case.
> >>>>
> >>>>
> >>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/
> >>>>
> >>>>
> >>>> On Sun, Oct 20, 2013 at 9:31 PM, Tony Duan <[email protected]>
> wrote:
> >>>>
> >>>>> Alex Vasilenko <aa.vasilenko@...> writes:
> >>>>>
> >>>>>>
> >>>>>> Lars,
> >>>>>>
> >>>>>> But how it will behave, when I have salt at the beginning of the key
> >> to
> >>>>>> properly shard table across regions? Imagine row key of format
> >>>>>> salt:timestamp and rows goes like this:
> >>>>>> ...
> >>>>>> 1:15
> >>>>>> 1:16
> >>>>>> 1:17
> >>>>>> 1:23
> >>>>>> 2:3
> >>>>>> 2:5
> >>>>>> 2:12
> >>>>>> 2:15
> >>>>>> 2:19
> >>>>>> 2:25
> >>>>>> ...
> >>>>>>
> >>>>>> And I want to find all rows, that has second part (timestamp) in
> range
> >>>>>> 15-25. What startKey and endKey should be used?
> >>>>>>
> >>>>>> Alexandr Vasilenko
> >>>>>> Web Developer
> >>>>>> Skype:menterr
> >>>>>> mob: +38097-611-45-99
> >>>>>>
> >>>>>> 2012/2/9 lars hofhansl <lhofhansl@...>
> >>>>> Hi,
> >>>>> Alexandr Vasilenko
> >>>>> Have you ever resolved this issue?i am also facing this iusse.
> >>>>> i also want implement this functionality.
> >>>>> Imagine row key of format
> >>>>> salt:timestamp and rows goes like this:
> >>>>> ...
> >>>>> 1:15
> >>>>> 1:16
> >>>>> 1:17
> >>>>> 1:23
> >>>>> 2:3
> >>>>> 2:5
> >>>>> 2:12
> >>>>> 2:15
> >>>>> 2:19
> >>>>> 2:25
> >>>>> ...
> >>>>>
> >>>>> And I want to find all rows, that has second part (timestamp) in
> range
> >>>>> 15-25.
> >>>>>
> >>>>> Could you please tell me how you resolve this ?
> >>>>> thanks  in advance.
> >>>>>
> >>>>>
> >>>>> Tony duan
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Premal Shah.
> >>>>
> >>>> Confidentiality Notice:  The information contained in this message,
> >>>> including any attachments hereto, may be confidential and is intended
> >> to be
> >>>> read only by the individual or entity to whom this message is
> >> addressed. If
> >>>> the reader of this message is not the intended recipient or an agent
> or
> >>>> designee of the intended recipient, please note that any review, use,
> >>>> disclosure or distribution of this message or its attachments, in any
> >> form,
> >>>> is strictly prohibited.  If you have received this message in error,
> >> please
> >>>> immediately notify the sender and/or [email protected] and
> >>>> delete or destroy any copy of this message and its attachments.
> >>>>
> >>
> >>
>
>

Re: row filter - binary comparator at certain range

Reply via email to