Thanks for the great info
On Thu, Nov 14, 2013 at 9:40 AM, James Taylor <[email protected]>wrote: > We ingest logs using Pig to write Phoenix-compliant HFiles, load those into > HBase and then use Phoenix (https://github.com/forcedotcom/phoenix) to > query directly over the HBase data through SQL. > > Regards, > James > > > On Thu, Nov 14, 2013 at 9:35 AM, sam wu <[email protected]> wrote: > > > we ingest data from log (one file/table, per event, per date) into HBase > > offline on daily basis. So we can get no_day info. > > My thoughts for churn analysis based on two types of user. > > green (young, maybe < 7 days in system), predict churn based on first 7? > > days activity, ideally predict while the user is still logging into the > > system, and if the churn probablity is high, reward sweets to keep them > > stay longer. > > Senior user, predict churn based on weekly? summary. > > > > One thought to accomplish this is to have one detailed daily table, and > > some summary (weekly?) table. new daily data get ingested into daily > table. > > Once every week, summary/move some old daily data into weekly table > > > > > > > > On Thu, Nov 14, 2013 at 9:15 AM, Pradeep Gollakota <[email protected] > > >wrote: > > > > > I'm a little curious as to how you would be able to use no_of_days as a > > > column qualifier at all... it changes everyday for all users right? So > > how > > > will you keep your table updated? > > > > > > > > > On Thu, Nov 14, 2013 at 9:07 AM, Jean-Marc Spaggiari < > > > [email protected]> wrote: > > > > > > > You can use your no_day as a column qualifier probably. > > > > > > > > The column families are best suitable to regroup column qualifiers > with > > > the > > > > same access (read/write) pattern. So if all your columns qualifiers > > have > > > > the same pattern, simply put them on the same familly. > > > > > > > > JM > > > > > > > > > > > > 2013/11/14 sam wu <[email protected]> > > > > > > > > > Thanks for the advise. > > > > > What about key is userId + no_day(since user registered), and > column > > > > family > > > > > is each typeEvent, and qualifier is the detailed trxs. > > > > > > > > > > > > > > > On Thu, Nov 14, 2013 at 8:51 AM, Jean-Marc Spaggiari < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi Sam, > > > > > > > > > > > > So are you saying that you will have about 30 column families? If > > so > > > I > > > > > > don't think tit's a good idea. > > > > > > > > > > > > JM > > > > > > > > > > > > > > > > > > 2013/11/13 Sam Wu <[email protected]> > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I am thinking about using Random Forest to do churn analysis > with > > > > Hbase > > > > > > as > > > > > > > NoSQL data store. > > > > > > > Currently, we have all the user history (basically many type > of > > > > event > > > > > > > data) resides in S3 & Redshift (we have one table per date/per > > > > event) > > > > > > > Events includes startTime, endTime, and other pertinent > > > > information,.. > > > > > > > > > > > > > > We are thinking about converting all the event tables into one > > fat > > > > > > > table(with other helper parameter tables) with one row per user > > > using > > > > > > Hbase. > > > > > > > > > > > > > > Each row will have user id as key, with some > > > column-family/qualifier, > > > > > > > e.g.: col-family, d1,d2,……d30 (days in the system), and > qualifier > > > as > > > > > > > different types of event. Since initially we are more > interested > > > in > > > > > new > > > > > > > user retention, so 30 days might be good to start with. > > > > > > > > > > > > > > We can label record as churning away by no active activity in > > > > > continuous > > > > > > > 10 days. > > > > > > > > > > > > > > If data schema looks good, ingest data from S3 into HBase. Then > > do > > > > > Random > > > > > > > Forest to classifier new profile data. > > > > > > > > > > > > > > Is this types of data a good candidate for Hbase. > > > > > > > Opinion is highly appreciated. > > > > > > > > > > > > > > > > > > > > > BR > > > > > > > > > > > > > > Sam > > > > > > > > > > > > > > > > > > > > >
