Hello, I have some familiarity with machine learning (in an academic setting) but am looking for some assistance on which Mahout algorithms would be suit my needs.
I am doing consumer behavior research at a web-marketing startup, where we generate a decent amount of data. We track behavioral data - engagement stats, view-times, feedback - and also have demographic data. We also have an inventory of items/sites, and some rudimentary (manual) categorizations. We were just approved for a data warehouse to integrate our data and I have approval to begin working on a consumer targeting platform. The core idea is to match consumers with items, testing different approaches for different classes of consumers and items. I expect to be looking at item-similarity, consumer-similarity, and hybrid models, and eventually incorporate global trends. Initially, I think we can start with a recommender engine, then develop a clustering/classifier. But I am now wanting more insight into what kinds of questions each is best at answering and how fit together. So far, my understanding of the difference is that recommenders accept input of users, positive/negative scoring, item, and timestamp, then output a recommendation (with variation depending on the specific algo). This leaves out demographic data (age, gender, zip, or even socioeconomic). I gather that clustering algos can incorporate this kind of data (and more) in order to find natural groupings. Is the natural connection point to find similar users and items using clustering, then feed that into a recommender? How does this feeding work? Or, if I the above is at all right-headed, what are some options as to how to make the connection? I appreciate in advance any answers, ideas, insights, or even questions any of you may have. Thanks, Josh
