We don't truncate the hash, we mod it. Why would you expect that data wouldn't be evenly distributed? We've not seen this to be the case.
On Mon, Oct 21, 2013 at 1:48 PM, Michael Segel <[email protected]>wrote: > What do you call hashing the row key? > Or hashing the row key and then appending the row key to the hash? > Or hashing the row key, truncating the hash value to some subset and then > appending the row key to the value? > > The problem is that there is specific meaning to the term salt. Re-using > it here will cause confusion because you're implying something you don't > mean to imply. > > you could say prepend a truncated hash of the key, however… is prepend a > real word? ;-) (I am sorry, I am not a grammar nazi, nor an English major. ) > > So even outside of Phoenix, the concept is the same. > Even with a truncated hash, you will find that over time, all but the tail > N regions will only be half full. > This could be both good and bad. > > (Where N is your number 8 or 16 allowable hash values.) > > You've solved potentially one problem… but still have other issues that > you need to address. > I guess the simple answer is to double the region sizes and not care that > most of your regions will be 1/2 the max size… but the size you really > want and 8-16 regions will be up to twice as big. > > > > On Oct 21, 2013, at 3:26 PM, James Taylor <[email protected]> wrote: > > > What do you think it should be called, because > > "prepending-row-key-with-single-hashed-byte" doesn't have a very good > ring > > to it. :-) > > > > Agree that getting the row key design right is crucial. > > > > The range of "prepending-row-key-with-single-hashed-byte" is declarative > > when you create your table in Phoenix, so you typically declare an upper > > bound based on your cluster size (not 255, but maybe 8 or 16). We've run > > the numbers and it's typically faster, but as with most things, not > always. > > > > HTH, > > James > > > > > > On Mon, Oct 21, 2013 at 1:05 PM, Michael Segel < > [email protected]>wrote: > > > >> Then its not a SALT. And please don't use the term 'salt' because it has > >> specific meaning outside to what you want it to mean. Just like saying > >> HBase has ACID because you write the entire row as an atomic element. > But > >> I digress…. > >> > >> Ok so to your point… > >> > >> 1 byte == 255 possible values. > >> > >> So which will be faster. > >> > >> creating a list of the 1 byte truncated hash of each possible timestamp > in > >> your range, or doing 255 separate range scans with the start and stop > range > >> key set? > >> > >> That will give you the results you want, however… I'd go back and have > >> them possibly rethink the row key if they can … assuming this is the > base > >> access pattern. > >> > >> HTH > >> > >> -Mike > >> > >> > >> > >> > >> > >> On Oct 21, 2013, at 11:37 AM, James Taylor <[email protected]> > wrote: > >> > >>> Phoenix restricts salting to a single byte. > >>> Salting perhaps is misnamed, as the salt byte is a stable hash based on > >> the > >>> row key. > >>> Phoenix's skip scan supports sub-key ranges. > >>> We've found salting in general to be faster (though there are cases > where > >>> it's not), as it ensures better parallelization. > >>> > >>> Regards, > >>> James > >>> > >>> > >>> > >>> On Mon, Oct 21, 2013 at 9:14 AM, Vladimir Rodionov > >>> <[email protected]>wrote: > >>> > >>>> FuzzyRowFilter does not work on sub-key ranges. > >>>> Salting is bad for any scan operation, unfortunately. When salt prefix > >>>> cardinality is small (1-2 bytes), > >>>> one can try something similar to FuzzyRowFilter but with additional > >>>> sub-key range support. > >>>> If salt prefix cardinality is high (> 2 bytes) - do a full scan with > >> your > >>>> own Filter (for timestamp ranges). > >>>> > >>>> Best regards, > >>>> Vladimir Rodionov > >>>> Principal Platform Engineer > >>>> Carrier IQ, www.carrieriq.com > >>>> e-mail: [email protected] > >>>> > >>>> ________________________________________ > >>>> From: Premal Shah [[email protected]] > >>>> Sent: Sunday, October 20, 2013 10:42 PM > >>>> To: user > >>>> Subject: Re: row filter - binary comparator at certain range > >>>> > >>>> Have you looked at FuzzyRowFilter? Seems to me that it might satisfy > >> your > >>>> use-case. > >>>> > >>>> > >> > http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/ > >>>> > >>>> > >>>> On Sun, Oct 20, 2013 at 9:31 PM, Tony Duan <[email protected]> > wrote: > >>>> > >>>>> Alex Vasilenko <aa.vasilenko@...> writes: > >>>>> > >>>>>> > >>>>>> Lars, > >>>>>> > >>>>>> But how it will behave, when I have salt at the beginning of the key > >> to > >>>>>> properly shard table across regions? Imagine row key of format > >>>>>> salt:timestamp and rows goes like this: > >>>>>> ... > >>>>>> 1:15 > >>>>>> 1:16 > >>>>>> 1:17 > >>>>>> 1:23 > >>>>>> 2:3 > >>>>>> 2:5 > >>>>>> 2:12 > >>>>>> 2:15 > >>>>>> 2:19 > >>>>>> 2:25 > >>>>>> ... > >>>>>> > >>>>>> And I want to find all rows, that has second part (timestamp) in > range > >>>>>> 15-25. What startKey and endKey should be used? > >>>>>> > >>>>>> Alexandr Vasilenko > >>>>>> Web Developer > >>>>>> Skype:menterr > >>>>>> mob: +38097-611-45-99 > >>>>>> > >>>>>> 2012/2/9 lars hofhansl <lhofhansl@...> > >>>>> Hi, > >>>>> Alexandr Vasilenko > >>>>> Have you ever resolved this issue?i am also facing this iusse. > >>>>> i also want implement this functionality. > >>>>> Imagine row key of format > >>>>> salt:timestamp and rows goes like this: > >>>>> ... > >>>>> 1:15 > >>>>> 1:16 > >>>>> 1:17 > >>>>> 1:23 > >>>>> 2:3 > >>>>> 2:5 > >>>>> 2:12 > >>>>> 2:15 > >>>>> 2:19 > >>>>> 2:25 > >>>>> ... > >>>>> > >>>>> And I want to find all rows, that has second part (timestamp) in > range > >>>>> 15-25. > >>>>> > >>>>> Could you please tell me how you resolve this ? > >>>>> thanks in advance. > >>>>> > >>>>> > >>>>> Tony duan > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Regards, > >>>> Premal Shah. > >>>> > >>>> Confidentiality Notice: The information contained in this message, > >>>> including any attachments hereto, may be confidential and is intended > >> to be > >>>> read only by the individual or entity to whom this message is > >> addressed. If > >>>> the reader of this message is not the intended recipient or an agent > or > >>>> designee of the intended recipient, please note that any review, use, > >>>> disclosure or distribution of this message or its attachments, in any > >> form, > >>>> is strictly prohibited. If you have received this message in error, > >> please > >>>> immediately notify the sender and/or [email protected] and > >>>> delete or destroy any copy of this message and its attachments. > >>>> > >> > >> > >
