Thanks for the great info


On Thu, Nov 14, 2013 at 9:40 AM, James Taylor <[email protected]>wrote:

> We ingest logs using Pig to write Phoenix-compliant HFiles, load those into
> HBase and then use Phoenix (https://github.com/forcedotcom/phoenix) to
> query directly over the HBase data through SQL.
>
> Regards,
> James
>
>
> On Thu, Nov 14, 2013 at 9:35 AM, sam wu <[email protected]> wrote:
>
> > we ingest data from log (one file/table, per event, per date) into HBase
> > offline on daily basis. So we can get no_day info.
> > My thoughts for churn analysis based on two types of user.
> > green (young, maybe < 7 days in system), predict churn based on first 7?
> > days activity, ideally predict while the user is still logging into the
> > system, and if the churn probablity is high, reward sweets to keep them
> > stay longer.
> > Senior user, predict churn based on weekly? summary.
> >
> > One thought to accomplish this is to have one detailed daily table, and
> > some summary (weekly?) table. new daily data get ingested into daily
> table.
> > Once every week, summary/move some old daily data into weekly table
> >
> >
> >
> > On Thu, Nov 14, 2013 at 9:15 AM, Pradeep Gollakota <[email protected]
> > >wrote:
> >
> > > I'm a little curious as to how you would be able to use no_of_days as a
> > > column qualifier at all... it changes everyday for all users right? So
> > how
> > > will you keep your table updated?
> > >
> > >
> > > On Thu, Nov 14, 2013 at 9:07 AM, Jean-Marc Spaggiari <
> > > [email protected]> wrote:
> > >
> > > > You can use your no_day as a column qualifier probably.
> > > >
> > > > The column families are best suitable to regroup column qualifiers
> with
> > > the
> > > > same access (read/write) pattern. So if all your columns qualifiers
> > have
> > > > the same pattern, simply put them on the same familly.
> > > >
> > > > JM
> > > >
> > > >
> > > > 2013/11/14 sam wu <[email protected]>
> > > >
> > > > > Thanks for the advise.
> > > > > What about key is userId + no_day(since user registered), and
> column
> > > > family
> > > > > is each typeEvent, and qualifier is the detailed trxs.
> > > > >
> > > > >
> > > > > On Thu, Nov 14, 2013 at 8:51 AM, Jean-Marc Spaggiari <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Hi Sam,
> > > > > >
> > > > > > So are you saying that you will have about 30 column families? If
> > so
> > > I
> > > > > > don't think tit's a good idea.
> > > > > >
> > > > > > JM
> > > > > >
> > > > > >
> > > > > > 2013/11/13 Sam Wu <[email protected]>
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I am thinking about using Random Forest to do churn analysis
> with
> > > > Hbase
> > > > > > as
> > > > > > > NoSQL data store.
> > > > > > > Currently,  we have all the user history (basically many type
> of
> > > > event
> > > > > > > data)  resides in S3 & Redshift (we have one table per date/per
> > > > event)
> > > > > > > Events includes startTime, endTime, and other pertinent
> > > > information,..
> > > > > > >
> > > > > > > We are thinking about converting all the event tables into one
> > fat
> > > > > > > table(with other helper parameter tables) with one row per user
> > > using
> > > > > > Hbase.
> > > > > > >
> > > > > > > Each row will have user id as key, with some
> > > column-family/qualifier,
> > > > > > > e.g.: col-family, d1,d2,……d30 (days in the system), and
> qualifier
> > > as
> > > > > > > different types of event.  Since initially we are more
> interested
> > > in
> > > > > new
> > > > > > > user retention, so 30 days might be good to start with.
> > > > > > >
> > > > > > > We can label record as churning away by no active activity in
> > > > > continuous
> > > > > > > 10 days.
> > > > > > >
> > > > > > > If data schema looks good, ingest data from S3 into HBase. Then
> > do
> > > > > Random
> > > > > > > Forest to classifier new profile data.
> > > > > > >
> > > > > > > Is this types of data a good candidate for Hbase.
> > > > > > > Opinion is highly appreciated.
> > > > > > >
> > > > > > >
> > > > > > > BR
> > > > > > >
> > > > > > > Sam
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to