Hi Yash, What exactly do you mean by “user-based” recommender? What does your data look like? What are the columns in the CSV? For collaborative filtering you will need a user-ID and an item-ID for each preference the user has expressed.
Mahout has several recommenders so building one should be easy. Is it ok to use an existing one? For all the recommenders you need a CSV of: user-ID,item-ID,preference-strength(optional) For the older in-memory or hadoop mapreduce recommenders the IDs must be ordinal non-negative ints that correspond to row and column numbers for the input matrix that will be created from all input elements. The first time you see the user-ID give it a Mahout ID of 0, the next unique user-ID will get 1, and so on. The same for item-IDs The newest technique is to use Mahout v1 built from source with Spark and the spark-itemsimilarity job, which will take your application specific ID strings and use them directly. Since this job takes CSVs as input you may be able to use your existing input file(s). The job creates a text file that can be indexed with a search engine to produce recommendations via queries. The query is a list of user history (a list of item-IDs). You get back an ordered list of item-IDs to recommend. Docs here: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html On Nov 26, 2014, at 11:16 AM, Yash Patel <yashpatel1...@gmail.com> wrote: Dear Mahout Team, I am a student new to machine learning and i am trying to build a user based recommender using mahout. My dataset is a csv file as an input but it has many fields as text and i understand mahout needs numeric values. Can you give me a headstart as to where i should start and what kind of tools i need to parse the text colummns, Also an idea on which classifiers or clustering methods i should use would be highly appreciated. Best Regards; Yash Patel