I'm a little curious as to how you would be able to use no_of_days as a
column qualifier at all... it changes everyday for all users right? So how
will you keep your table updated?


On Thu, Nov 14, 2013 at 9:07 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> You can use your no_day as a column qualifier probably.
>
> The column families are best suitable to regroup column qualifiers with the
> same access (read/write) pattern. So if all your columns qualifiers have
> the same pattern, simply put them on the same familly.
>
> JM
>
>
> 2013/11/14 sam wu <[email protected]>
>
> > Thanks for the advise.
> > What about key is userId + no_day(since user registered), and column
> family
> > is each typeEvent, and qualifier is the detailed trxs.
> >
> >
> > On Thu, Nov 14, 2013 at 8:51 AM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > Hi Sam,
> > >
> > > So are you saying that you will have about 30 column families? If so I
> > > don't think tit's a good idea.
> > >
> > > JM
> > >
> > >
> > > 2013/11/13 Sam Wu <[email protected]>
> > >
> > > > Hi all,
> > > >
> > > > I am thinking about using Random Forest to do churn analysis with
> Hbase
> > > as
> > > > NoSQL data store.
> > > > Currently,  we have all the user history (basically many type of
> event
> > > > data)  resides in S3 & Redshift (we have one table per date/per
> event)
> > > > Events includes startTime, endTime, and other pertinent
> information,..
> > > >
> > > > We are thinking about converting all the event tables into one fat
> > > > table(with other helper parameter tables) with one row per user using
> > > Hbase.
> > > >
> > > > Each row will have user id as key, with some column-family/qualifier,
> > > > e.g.: col-family, d1,d2,……d30 (days in the system), and qualifier as
> > > > different types of event.  Since initially we are more interested in
> > new
> > > > user retention, so 30 days might be good to start with.
> > > >
> > > > We can label record as churning away by no active activity in
> > continuous
> > > > 10 days.
> > > >
> > > > If data schema looks good, ingest data from S3 into HBase. Then do
> > Random
> > > > Forest to classifier new profile data.
> > > >
> > > > Is this types of data a good candidate for Hbase.
> > > > Opinion is highly appreciated.
> > > >
> > > >
> > > > BR
> > > >
> > > > Sam
> > >
> >
>

Reply via email to