Hi all, I am thinking about using Random Forest to do churn analysis with Hbase as NoSQL data store. Currently, we have all the user history (basically many type of event data) resides in S3 & Redshift (we have one table per date/per event) Events includes startTime, endTime, and other pertinent information,..
We are thinking about converting all the event tables into one fat table(with other helper parameter tables) with one row per user using Hbase. Each row will have user id as key, with some column-family/qualifier, e.g.: col-family, d1,d2,……d30 (days in the system), and qualifier as different types of event. Since initially we are more interested in new user retention, so 30 days might be good to start with. We can label record as churning away by no active activity in continuous 10 days. If data schema looks good, ingest data from S3 into HBase. Then do Random Forest to classifier new profile data. Is this types of data a good candidate for Hbase. Opinion is highly appreciated. BR Sam
