Re: Help in designing row key

Flavio Pompermaier Thu, 04 Jul 2013 02:49:01 -0700

Yes I saw it. I followed Ted advice to use
scan.setTimeRange(sometimestamp, Long.MAX_VALUE)


On Wed, Jul 3, 2013 at 11:23 PM, Asaf Mesika <[email protected]> wrote:

> Seems right. You can make it more efficient by creating your result array
> in advance and then fill it.
> Regarding time filtering. Have you see that in Scan you can set start time
> and end time?
>
> On Wednesday, July 3, 2013, Flavio Pompermaier wrote:
>
> > All my enums produce positive integers so I don't have +/-ve Integer
> > problems.
> > Obviously If I use fixed-length rowKeys I could take away the separator..
> >
> > Sorry but I'm very a newbie in this field..I'm trying to understand how
> to
> > compose my key with the bytes..
> > Is it correct the following?
> >
> > final byte[] firstToken = Bytes.toBytes(source);
> > final byte[] secondToken = Bytes.toBytes(type);
> > final byte[] thirdToken = Bytes.toBytes(qualifier);
> > final byte[] fourthToken = Bytes.toBytes(md5ofSomeString);
> > byte[] rowKey = Bytes.add(firstToken,secondToken,thirdToken);
> > rowKey =  Bytes.add(rowKey,fourthToken);
> >
> > Best,
> > Flavio
> >
> >
> > On Wed, Jul 3, 2013 at 11:58 AM, Anoop John <[email protected]>
> wrote:
> >
> > > When you make the RK and convert the int parts into byte[] ( Use
> > > org.apache.hadoop.hbase.util.Bytes#toBytes(*int) *)  it will give 4
> bytes
> > > for every byte..  Be careful about the ordering...   When u convert a
> +ve
> > > and -ve integer into byte[] and u do Lexiographical compare (as done in
> > > HBase) u will see -ve number being greater than +ve..  If you dont have
> > to
> > > do deal with -ve numbers no issues  :)
> > >
> > > Well when all the parts of the RK is of fixed width u will need any
> > > seperator??
> > >
> > > -Anoop-
> > >
> > > On Wed, Jul 3, 2013 at 2:44 PM, Flavio Pompermaier <
> [email protected]
> > > >wrote:
> > >
> > > > Yeah, I was thinking to use a normalization step in order to allow
> the
> > > use
> > > > of FuzzyRowFilter but what is not clear to me is if integers must
> also
> > be
> > > > normalized or not.
> > > > I will explain myself better. Suppose that i follow your advice and I
> > > > produce keys like:
> > > >  - 1|1|somehash|sometimestamp
> > > >  - 55|555|somehash|sometimestamp
> > > >
> > > > Whould they match the same pattern or do I have to normalize them to
> > the
> > > > following?
> > > >  - 001|001|somehash|sometimestamp
> > > >  - 055|555|somehash|sometimestamp
> > > >
> > > > Moreover, I noticed that you used dots ('.') to separate things
> instead
> > > of
> > > > pipe ('|')..is there a reason for that (maybe performance or
> whatever)
> > or
> > > > is just your favourite separator?
> > > >
> > > > Best,
> > > > Flavio
> > > >
> > > >
> > > > On Wed, Jul 3, 2013 at 10:12 AM, Mike Axiak <[email protected]> wrote:
> > > >
> > > > > I'm not sure if you're eliding this fact or not, but you'd be much
> > > > > better off if you used a fixed-width format for your keys. So in
> your
> > > > > example, you'd have:
> > > > >
> > > > > PATTERN: source(4-byte-int).type(4-byte-int or smaller).fixed
> 128-bit
> > > > > hash.8-byte timestamp
> > > > >
> > > > > Example: \x00\x00\x00\x01\x00\x00\x02\x03....
> > > > >
> > > > > The advantage of this is not only that it's significantly less data
> > > > > (remember your key is stored on each KeyValue), but also you can
> now
> > > > > use FuzzyRowFilter and other techniques to quickly perform scans.
> The
> > > > > disadvantage is that you have to normalize the source-> integer
> but I
> > > > > find I can either store that in an enum or cache it for a long time
> > so
> > > > > it's not a big issue.
> > > > >
> > > > > -Mike
> > > > >
> > > > > On Wed, Jul 3, 2013 at 4:05 AM, Flavio Pompermaier <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > > > Thank you very much for the great support!
> > > > > > This is how I thought to design my key:
> > > > > >
> > > > > > PATTERN: source|type|qualifier|hash(name)|timestamp
> > > > > > EXAMPLE:
> > > > > >
> > google|appliance|oven|be9173589a7471a7179e928adc1a86f7|1372837702753
> > > > > >
> > > > > > Do you think my key could be good for my scope (my search will be
> > > > > > essentially by source or source|type)?
> > > > > > Another point is that initially I will not have so many sources,
> > so I
> > > > > will
> > > > > > probably have only google|* but in the next phases there could be
> > > more
> > > > > > sources..
> > > > > >
> > > > > > Best,
> > > > > > Flavio
> > > > > >
> > > > > > On Tue, Jul 2, 2013 at 7:53 PM, Ted Yu <[email protected]>
> > wrote:
> > > > > >
> > > > > >> For #1, yes - the client receives less data after filtering.
> > > > > >>
> > > > > >> For #2, please take a look at TestMultiVersions
> > > > > >> (./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java
> in
> > > > 0.94)
> > > > > >> for time range:
> > > > >
>

Re: Help in designing row key

Reply via email to