Hello everyone, wow i am quite happy to see so many inputs from people.
I apologize for not providing more details. Although this is not my complete dataset the fields i have chosen to use are: customer id - numeric item id - text postal code - text item category ยด- text potential growth - text territory - text Basically i was thinking of finding similar users and recommending them items that users like them have bought but they haven't. Although i would very much like to hear your opinions as i am not so familiar with clustering,classifiers etc. I found that mahout takes sequence files converted into vectors but i couldn't understand how would i do it on my data specifically and more importantly make a recommender system out of it. Also i am wondering how to combine the importance of a specific customer through the potential growth attribute. Best Regards, Yash Patel On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > All very good points but note that spark-itemsimilarity may take the input > directly since you specify column numbers for <UID><ITEMID><PREF_VALUE> > > On Nov 26, 2014, at 11:43 AM, parnab kumar <parnab.2...@gmail.com> wrote: > > kindly elaborate... your requirements... your dataset fields ...and what > you want to recommend to an user... Usually a set of item is recommended to > an user. In your case what are your items ? > > The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is not > in this format which will let you use directly the algorithms in Mahout. > > A little more info from your side will help us to give your the right > pointers. > > On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <yashpatel1...@gmail.com> > wrote: > > > Dear Mahout Team, > > > > I am a student new to machine learning and i am trying to build a user > > based recommender using mahout. > > > > My dataset is a csv file as an input but it has many fields as text and i > > understand mahout needs numeric values. > > > > Can you give me a headstart as to where i should start and what kind of > > tools i need to parse the text colummns, > > > > Also an idea on which classifiers or clustering methods i should use > would > > be highly appreciated. > > > > > > Best Regards; > > Yash Patel > > > >