Thanks for the tip. In the data warehousing world I used to call them surrogate keys - I wonder if there's any difference between the two.
On Tue, Sep 17, 2013 at 6:41 PM, Vladimir Rodionov <[email protected]>wrote: > > Is there a built-in functionality to generate (integer) surrogate values > in > > hbase that can be used on the rowkey or does it need to be hand code it > > from scratch? > > There is no such functionality in HBase. What are asking for is known as a > dictionary compression : > unique 1-1 association between arbitrary strings and numeric values. > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > From: Ted Yu [[email protected]] > Sent: Tuesday, September 17, 2013 9:53 AM > To: [email protected] > Subject: Re: hbase schema design > > I guess you were referring to section 6.3.2 > > bq. rowkey is stored and/ or read for every cell value > > The above is true. > > bq. the event description is a string of 0.1 to 2Kb > > You can enable Data Block encoding to reduce storage. > > Cheers > > > > On Tue, Sep 17, 2013 at 9:44 AM, Adrian CAPDEFIER <[email protected] > >wrote: > > > Howdy all, > > > > I'm trying to use hbase for the first time (plenty of other experience > with > > RDBMS database though), and I have a couple of questions after reading > The > > Book. > > > > I am a bit confused by the advice to reduce "the row size" in the hbase > > book. It states that every cell value is accomplished by the coordinates > > (row, column and timestamp). I'm just trying to be thorough, so am I to > > understand that the rowkey is stored and/ or read for every cell value > in a > > record or just once per column family in a record? > > > > I am intrigued by the rows as columns design as described in the book at > > http://hbase.apache.org/book.html#rowkey.design. To make a long story > > short, I will end up with a table to store event types and number of > > occurrences in each day. I would prefer to have the event description as > > the row key and the dates when it happened as columns - up to 7300 for > > roughly 20 years. > > However, the event description is a string of 0.1 to 2Kb and if it is > > stored for each cell value, I will need to use a surrogate (shorter) > value. > > > > Is there a built-in functionality to generate (integer) surrogate values > in > > hbase that can be used on the rowkey or does it need to be hand code it > > from scratch? > > > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
