You can use your no_day as a column qualifier probably. The column families are best suitable to regroup column qualifiers with the same access (read/write) pattern. So if all your columns qualifiers have the same pattern, simply put them on the same familly.
JM 2013/11/14 sam wu <[email protected]> > Thanks for the advise. > What about key is userId + no_day(since user registered), and column family > is each typeEvent, and qualifier is the detailed trxs. > > > On Thu, Nov 14, 2013 at 8:51 AM, Jean-Marc Spaggiari < > [email protected]> wrote: > > > Hi Sam, > > > > So are you saying that you will have about 30 column families? If so I > > don't think tit's a good idea. > > > > JM > > > > > > 2013/11/13 Sam Wu <[email protected]> > > > > > Hi all, > > > > > > I am thinking about using Random Forest to do churn analysis with Hbase > > as > > > NoSQL data store. > > > Currently, we have all the user history (basically many type of event > > > data) resides in S3 & Redshift (we have one table per date/per event) > > > Events includes startTime, endTime, and other pertinent information,.. > > > > > > We are thinking about converting all the event tables into one fat > > > table(with other helper parameter tables) with one row per user using > > Hbase. > > > > > > Each row will have user id as key, with some column-family/qualifier, > > > e.g.: col-family, d1,d2,……d30 (days in the system), and qualifier as > > > different types of event. Since initially we are more interested in > new > > > user retention, so 30 days might be good to start with. > > > > > > We can label record as churning away by no active activity in > continuous > > > 10 days. > > > > > > If data schema looks good, ingest data from S3 into HBase. Then do > Random > > > Forest to classifier new profile data. > > > > > > Is this types of data a good candidate for Hbase. > > > Opinion is highly appreciated. > > > > > > > > > BR > > > > > > Sam > > >
