Hi Yash,

What exactly do you mean by “user-based” recommender? What does your data look 
like? What are the columns in the CSV? For collaborative filtering you will 
need a user-ID and an item-ID for each preference the user has expressed.  

Mahout has several recommenders so building one should be easy. Is it ok to use 
an existing one?

For all the recommenders you need a CSV of:
user-ID,item-ID,preference-strength(optional)

For the older in-memory or hadoop mapreduce recommenders the IDs must be 
ordinal non-negative ints that correspond to row and column numbers for the 
input matrix that will be created from all input elements. The first time you 
see the user-ID give it a Mahout ID of 0, the next unique user-ID will get 1, 
and so on. The same for item-IDs

The newest technique is to use Mahout v1 built from source with Spark and the 
spark-itemsimilarity job, which will take your application specific ID strings 
and use them directly. Since this job takes CSVs as input you may be able to 
use your existing input file(s). The job creates a text file that can be 
indexed with a search engine to produce recommendations via queries. The query 
is a list of user history (a list of item-IDs). You get back an ordered list of 
item-IDs to recommend. 

Docs here: 
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


On Nov 26, 2014, at 11:16 AM, Yash Patel <yashpatel1...@gmail.com> wrote:

Dear Mahout Team,

I am a student new to machine learning and i am trying to build a user
based recommender using mahout.

My dataset is a csv file as an input but it has many fields as text and i
understand mahout needs numeric values.

Can you give me a headstart as to where i should start and what kind of
tools i need to parse the text colummns,

Also an idea on which classifiers or clustering methods i should use would
be highly appreciated.


Best Regards;
Yash Patel

Reply via email to