Thank you for the guidance. I will try building something rough and ask questions if i run into any errors.
On Sat, Nov 29, 2014 at 10:38 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > The Mahout site is a good starting point for using any of the recommenders. > > http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html > > On Nov 29, 2014, at 1:33 PM, Yash Patel <yashpatel1...@gmail.com> wrote: > > Can you give me some more details on the Hadoop mapreduce item-based > cooccurrence recommender. > > > Best Regards, > Yash Patel > > On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > > > I built this app with it: https://guide.finderbots.com > > > > The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes > > out of the job it is csv text—therefore language and architecture > neutral. > > I load the data from spark-itemsimilarity into MongoDB using java. Solr > is > > set up for full-text indexing and queries using data from MongoDB. The > > queries are made to Solr through REST from Ruby UX code. You can replace > > any component in this stack with whatever you wish and use whatever > > language you are comfortable with. > > > > Alternatively you could modify the UI of Solr or Elasticsearch—both are > in > > Java. > > > > If you use any of the other Mahout recommenders they create all recs for > > all known users so you’ll still need to build a way to serve those > results. > > People often use DBs for this and integrate with their web app framework. > > > > On Nov 28, 2014, at 10:03 AM, Yash Patel <yashpatel1...@gmail.com> > wrote: > > > > I looked up spark row similarity but i am not sure if it will suit my > needs > > as i want to build my recommender as a java application possibly with an > > interface. > > > > > > On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <p...@occamsmachete.com> > wrote: > > > >> Some references: > >> > >> small free book here, which talks about the general idea: > >> https://www.mapr.com/practical-machine-learning > >> preso, which talks about mixing actions or other indicators: > >> > > > http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ > >> two blog posts: > >> > > > http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ > >> > > > http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ > >> mahout docs: > >> > http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html > >> > >> Build Mahout from this source: https://github.com/apache/mahout This > > will > >> run stand-alone on a dev machine, then if your data is too big for a > > single > >> machine you can run it on a Spark + Hadoop cluster. The data this > creates > >> can be put into a DB or indexed directly by a search engine (Solr or > >> Elasticsearch). Choose the search engine you want then queries of a > > user’s > >> item id history will go there--results will be an ordered list of item > > ids > >> to recommend. > >> > >> The core piece is the command line job: “mahout spark-itemsimilarity”, > >> which can parse csv data. The options specify what columns are used for > > ids. > >> > >> Start out simple by looking only at user and item IDs. Then you can add > >> other cross-cooccurrence indicators for multiple actions later pretty > >> easily. > >> > >> > >> On Nov 28, 2014, at 12:14 AM, Yash Patel <yashpatel1...@gmail.com> > > wrote: > >> > >> The mahout + search engine recommender seems what would be best for the > >> data i have. > >> > >> Kindly get back to me at your earliest convenience. > >> > >> > >> > >> Best Regards, > >> Yash Patel > >> > >> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <p...@occamsmachete.com> > > wrote: > >> > >>> Mahout has several recommenders so no need to create one from > > components. > >>> They all make use of the similarity of preferences between users—that’s > >> why > >>> they are in the category of collaborative filtering. > >>> > >>> Primary Mahout Recommenders: > >>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all > > recs > >>> for all users. Uses “Mahout IDs" > >>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise > in > >>> the data. Sometimes better for small data sets than #1. Uses “Mahout > > IDs" > >>> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works > >>> with multiple actions (multi-modal), works for new users that have some > >>> history, has a scalable server (from the search engine) but is more > >>> difficult to integrate than #1 or #2. Uses your own ids and reads csv > >> files. > >>> > >>> The rest of the data seems to apply either to the user or the item and > > so > >>> would be used in different ways. #1 an #2 can only use user id and item > >> id > >>> but some post recommendation weighting or filtering can be applied. #3 > >> can > >>> use multiple attributes in different ways. For instance if category is > > an > >>> item attribute you can create two actions, user-pref-for-an-item, and > >>> user-pref-for-a-category. Assuming you want to recommend an item (not > >>> category) you can create a cross-ccoccurrence indicator for the second > >>> action and use the data to make your item recs better. #3 is the only > >>> methods that supports this. > >>> > >>> Pick a recommender and we can help more with data prep. > >>> > >>> > >>> On Nov 26, 2014, at 1:34 PM, Yash Patel <yashpatel1...@gmail.com> > > wrote: > >>> > >>> Hello everyone, > >>> > >>> wow i am quite happy to see so many inputs from people. > >>> > >>> I apologize for not providing more details. > >>> > >>> Although this is not my complete dataset the fields i have chosen to > use > >>> are: > >>> > >>> customer id - numeric > >>> item id - text > >>> postal code - text > >>> item category ´- text > >>> potential growth - text > >>> territory - text > >>> > >>> > >>> Basically i was thinking of finding similar users and recommending them > >>> items that users like them have bought but they haven't. > >>> > >>> Although i would very much like to hear your opinions as i am not so > >>> familiar with clustering,classifiers etc. > >>> > >>> I found that mahout takes sequence files converted into vectors but i > >>> couldn't understand how would i do it on my data specifically and more > >>> importantly make a recommender system out of it. > >>> > >>> Also i am wondering how to combine the importance of a specific > customer > >>> through the potential growth attribute. > >>> > >>> > >>> > >>> > >>> > >>> > >>> Best Regards, > >>> Yash Patel > >>> > >>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <p...@occamsmachete.com> > >> wrote: > >>> > >>>> All very good points but note that spark-itemsimilarity may take the > >>> input > >>>> directly since you specify column numbers for > <UID><ITEMID><PREF_VALUE> > >>>> > >>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <parnab.2...@gmail.com> > >>> wrote: > >>>> > >>>> kindly elaborate... your requirements... your dataset fields ...and > > what > >>>> you want to recommend to an user... Usually a set of item is > > recommended > >>> to > >>>> an user. In your case what are your items ? > >>>> > >>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is > >>> not > >>>> in this format which will let you use directly the algorithms in > > Mahout. > >>>> > >>>> A little more info from your side will help us to give your the right > >>>> pointers. > >>>> > >>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <yashpatel1...@gmail.com> > >>>> wrote: > >>>> > >>>>> Dear Mahout Team, > >>>>> > >>>>> I am a student new to machine learning and i am trying to build a > user > >>>>> based recommender using mahout. > >>>>> > >>>>> My dataset is a csv file as an input but it has many fields as text > > and > >>> i > >>>>> understand mahout needs numeric values. > >>>>> > >>>>> Can you give me a headstart as to where i should start and what kind > > of > >>>>> tools i need to parse the text colummns, > >>>>> > >>>>> Also an idea on which classifiers or clustering methods i should use > >>>> would > >>>>> be highly appreciated. > >>>>> > >>>>> > >>>>> Best Regards; > >>>>> Yash Patel > >>>>> > >>>> > >>>> > >>> > >>> > >> > >> > > > > > >