Some references:

small free book here, which talks about the general idea: 
https://www.mapr.com/practical-machine-learning
preso, which talks about mixing actions or other indicators: 
http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
 
two blog posts: 
http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
 
http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
mahout docs: 
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

Build Mahout from this source: https://github.com/apache/mahout This will run 
stand-alone on a dev machine, then if your data is too big for a single machine 
you can run it on a Spark + Hadoop cluster. The data this creates can be put 
into a DB or indexed directly by a search engine (Solr or Elasticsearch). 
Choose the search engine you want then queries of a user’s item id history will 
go there--results will be an ordered list of item ids to recommend. 

The core piece is the command line job: “mahout spark-itemsimilarity”, which 
can parse csv data. The options specify what columns are used for ids.

Start out simple by looking only at user and item IDs. Then you can add other 
cross-cooccurrence indicators for multiple actions later pretty easily.


On Nov 28, 2014, at 12:14 AM, Yash Patel <yashpatel1...@gmail.com> wrote:

The mahout + search engine recommender seems what would be best for the
data i have.

Kindly get back to me at your earliest convenience.



Best Regards,
Yash Patel

On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> Mahout has several recommenders so no need to create one from components.
> They all make use of the similarity of preferences between users—that’s why
> they are in the category of collaborative filtering.
> 
> Primary Mahout Recommenders:
> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs
> for all users. Uses “Mahout IDs"
> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
> the data. Sometimes better for small data sets than #1. Uses “Mahout IDs"
> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> with multiple actions (multi-modal), works for new users that have some
> history, has a scalable server (from the search engine) but is more
> difficult to integrate than #1 or #2. Uses your own ids and reads csv files.
> 
> The rest of the data seems to apply either to the user or the item and so
> would be used in different ways. #1 an #2 can only use user id and item id
> but some post recommendation weighting or filtering can be applied. #3 can
> use multiple attributes in different ways. For instance if category is an
> item attribute you can create two actions, user-pref-for-an-item, and
> user-pref-for-a-category. Assuming you want to recommend an item (not
> category) you can create a cross-ccoccurrence indicator for the second
> action and use the data to make your item recs better. #3 is the only
> methods that supports this.
> 
> Pick a recommender and we can help more with data prep.
> 
> 
> On Nov 26, 2014, at 1:34 PM, Yash Patel <yashpatel1...@gmail.com> wrote:
> 
> Hello everyone,
> 
> wow i am quite happy to see so many inputs from people.
> 
> I apologize for not providing more details.
> 
> Although this is not my complete dataset the fields i have chosen to use
> are:
> 
> customer id - numeric
> item id - text
> postal code - text
> item category ´- text
> potential growth - text
> territory - text
> 
> 
> Basically i was thinking of finding similar users and recommending them
> items that users like them have bought but they haven't.
> 
> Although i would very much like to hear your opinions as i am not so
> familiar with clustering,classifiers etc.
> 
> I found that mahout takes sequence files converted into vectors but i
> couldn't understand how would i do it on my data specifically and more
> importantly make a recommender system out of it.
> 
> Also i am wondering how to combine the importance of a specific customer
> through the potential growth attribute.
> 
> 
> 
> 
> 
> 
> Best Regards,
> Yash Patel
> 
> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
> 
>> All very good points but note that spark-itemsimilarity may take the
> input
>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
>> 
>> On Nov 26, 2014, at 11:43 AM, parnab kumar <parnab.2...@gmail.com>
> wrote:
>> 
>> kindly elaborate... your requirements... your dataset fields ...and what
>> you want to recommend to an user... Usually a set of item is recommended
> to
>> an user. In your case what are your items ?
>> 
>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> not
>> in this format which will let you use directly the algorithms in Mahout.
>> 
>> A little more info from your side will help us to give your the right
>> pointers.
>> 
>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <yashpatel1...@gmail.com>
>> wrote:
>> 
>>> Dear Mahout Team,
>>> 
>>> I am a student new to machine learning and i am trying to build a user
>>> based recommender using mahout.
>>> 
>>> My dataset is a csv file as an input but it has many fields as text and
> i
>>> understand mahout needs numeric values.
>>> 
>>> Can you give me a headstart as to where i should start and what kind of
>>> tools i need to parse the text colummns,
>>> 
>>> Also an idea on which classifiers or clustering methods i should use
>> would
>>> be highly appreciated.
>>> 
>>> 
>>> Best Regards;
>>> Yash Patel
>>> 
>> 
>> 
> 
> 

Reply via email to