Thanks Vladimir. In the case where I need a chronological order of events, I always need retrieve all the rows "<userid>*" rows or are there any other alternatives ways to design the row key?
Thanks again! On Thu, Aug 27, 2015 at 12:10 PM, Vladimir Rodionov <[email protected]> wrote: > <userid>_<reverse_timestamp> is better (Long.MAX_VALUE - time) - most > recent events will come first during scan. This will allow you to do > efficient time range queries by user_id and start and end time. > > -Vlad > > On Thu, Aug 27, 2015 at 11:58 AM, Buntu Dev <[email protected]> wrote: > > > I'm planning on writing a time series of user action events including > user > > profile, attributes and product purchase transactions to answer these > > questions/queries: > > > > - What are the events leading up to the users conversion ie, purchase? > > - What the different attributes that changed over a given time period? > > - What is the LTV of a given user? > > - Retrieve list of attributes set/enabled for given user at some point in > > time. > > > > > > As a newbie to HBase, I wanted to confirm that tall table design ie, with > > row key <userid>_<timestamp> is _not_ the right design due to these > > reasons: > > > > * scanning for the latest state of user seems to be an expensive > operation > > since not all the columns will be available in the latest event for the > > user > > > > * constructing a row key always requires timestamp to the appended if I'm > > not using the regex filtering > > > > * fetching the user at some point in time t1 involves fetching all the > > "<userid>*" rows and looking up the row with timestamp <= t1 > > > > > > Are these valid concerns? > > > > Thanks! > > >
