I see Ted created JIRA ticket for this already: https://issues.apache.org/jira/browse/MAHOUT-1288 We should consider changing issue type (currently - bug).
One might find this Berlin Buzzwords 2013 recording<http://www.youtube.com/watch?v=fWR1T2pY08Y>and slides<http://www.slideshare.net/tdunning/buzz-wordsdunningmultimodalrecommendation>of Ted's talk on the subject helpful to understand the terms used and idea. I guess we could start with single kind of interaction/behavior, and consider adding more later. Shall we make it separate subproject (so on level of mahout and site, but still under mahout svn), or make a new mahout submodule, or change mahout examples from single module to a multimodule structure and add the recommender demo as submodule there? I'm fine with Maven tasks, to some extent Solr too (not the most recent versions, but I see it as nice opportunity to update). Kind regards, Stevo Slavic. On Sun, Jul 21, 2013 at 12:15 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > To kick this off, I have created a design document that is open for > comments. Much detail is needed here. I will create a JIRA as well, but > the google doc is much easier for collating lots of input into a coherent > document. > > The directory that the document is stored in is accessible at > > http:// bit.ly/18vbbaT <http://bit.ly/18vbbaT> > > Once we get going, we can talk about how to coordinate tasks between > hangouts. One option is a public Trello project: https://trello.com/ or > we > can use JIRA sub-tasks. > > > On Sat, Jul 20, 2013 at 11:25 AM, Andrew Psaltis < > andrew.psal...@webtrends.com> wrote: > > > I am very interested in collaborating on the off-line to Solr part. Just > > let me know how we want to get going. > > > > Thanks, > > Andrew > > > > > > > > > > > > On 7/19/13 4:45 PM, "Ted Dunning" <ted.dunn...@gmail.com> wrote: > > > > >OK. I think the crux here is the off-line to Solr part so let's see who > > >else pops up. > > > > > >Having a solr maven could be very helpful. > > > > > > > > >On Fri, Jul 19, 2013 at 3:39 PM, Luis Carlos Guerrero Covo < > > >lcguerreroc...@gmail.com> wrote: > > > > > >> I'm currently working for a portal that has a similar use case and I > was > > >> thinking of implementing this in a similar way. I'm generating > > >> recommendations using python scripts based on similarity measures > > >>(content > > >> based recommendation) only using euclidean distance and some weights > for > > >> each attribute. I want to use mahout's GenericItemBasedRecommender to > > >> generate these same recommendations without user data (no tracking > right > > >> now of user to item relationship). I was thinking of pushing the > > >>generated > > >> recommendations to solr using atomic updates since my fields are all > > >>stored > > >> right now. Since this is very similar to what I'm trying to > accomplish, > > >>I > > >> would sign up to collaborate in any way I can since I'm fairly > familiar > > >> with solr and I'm starting to learn my way around mahout. > > >> > > >> > > >> On Fri, Jul 19, 2013 at 5:12 PM, Sebastian Schelter <s...@apache.org> > > >> wrote: > > >> > > >> > I would also be willing to provide guidance and advice for anyone > > >>taking > > >> > this on, I can especially help with the offline analysis part. > > >> > > > >> > --sebastian > > >> > > > >> > > > >> > 2013/7/19 Ted Dunning <ted.dunn...@gmail.com> > > >> > > > >> > > I would be happy to supervise a project to implement a demo of > this > > >>if > > >> > > anybody is willing to do the grunt work of gluing things together. > > >> > > > > >> > > Sooo, if you would like to work on this, here is a suggested > > >>project. > > >> > > > > >> > > This project would entail: > > >> > > > > >> > > a) build a synthetic data source > > >> > > > > >> > > b) write scripts to do the off-line analysis > > >> > > > > >> > > c) write scripts to export to Solr > > >> > > > > >> > > d) write a very quick web facade over Solr to make it look like a > > >> > > recommendation engine. This would include > > >> > > > > >> > > d.1) a "most popular page" that does combined popularity rise > and > > >> > > recommendation > > >> > > > > >> > > d.2) a "personal recommendation page" that does just > > >>recommendation > > >> > with > > >> > > dithering > > >> > > > > >> > > d.3) item pages with "related items" at the bottom > > >> > > > > >> > > e) work with others to provide high quality system walk-through > and > > >> > install > > >> > > directions > > >> > > > > >> > > If you want to bite on this, we should arrange a weekly video > > >>hangout. > > >> I > > >> > > am willing to commit to guiding and providing detailed technical > > >> > > approaches. You should be willing to commit to actually doing > > >>stuff. > > >> > > > > >> > > The goal would be to provide a fully worked out scaffolding of a > > >> > practical > > >> > > recommendation system that presumably would become an example > > >>module in > > >> > > Mahout. > > >> > > > > >> > > > > >> > > On Fri, Jul 19, 2013 at 1:08 PM, B Lyon <bradfl...@gmail.com> > > wrote: > > >> > > > > >> > > > +1 as well. Sounds fun. > > >> > > > > > >> > > > On Fri, Jul 19, 2013 at 4:06 PM, Dominik Hübner < > > >> cont...@dhuebner.com > > >> > > > >wrote: > > >> > > > > > >> > > > > +1 for getting something like that in a future release of > Mahout > > >> > > > > > > >> > > > > On Jul 19, 2013, at 10:02 PM, Sebastian Schelter > > >><s...@apache.org> > > >> > > wrote: > > >> > > > > > > >> > > > > > It would be awesome if we could get a nice, easily > deployable > > >> > > > > > implementation of that approach into Mahout before 1.0 > > >> > > > > > > > >> > > > > > > > >> > > > > > 2013/7/19 Ted Dunning <ted.dunn...@gmail.com> > > >> > > > > > > > >> > > > > >> My current advice is to use Hadoop (if necessary) to build > a > > >> > sparse > > >> > > > > >> item-item matrix based on each kind of behavior you have > and > > >> then > > >> > > drop > > >> > > > > >> those similarities into a search engine to deliver the > actual > > >> > > > > >> recommendations. This allows lots of flexibility in terms > of > > >> > which > > >> > > > > kinds > > >> > > > > >> of inputs you use for the recommendation and lets you blend > > >> > > > > recommendations > > >> > > > > >> with search and geo-location. > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> On Fri, Jul 19, 2013 at 12:33 PM, Helder Martins < > > >> > > > > >> helder.ga...@corp.terra.com.br> wrote: > > >> > > > > >> > > >> > > > > >>> Hi, > > >> > > > > >>> I'm a dev working for a web portal in Brazil and I'm > > >> particularly > > >> > > > > >>> interested in building a item-based collaborative > filtering > > >> > > > recommender > > >> > > > > >>> for our database of news articles. > > >> > > > > >>> After some coding, I was able to get some recommendations > > >> using a > > >> > > > > >>> GenericItemBasedRecommender, a CassandraDataModel and some > > >> custom > > >> > > > > >>> classes that store item similarities and migrated item IDs > > >>into > > >> > > > > >>> Cassandra. But know I'm in doubt of what is normally done > > >>with > > >> > this > > >> > > > > >>> recommender: Should I run this as a daemon, cache the > > >> > > recommendations > > >> > > > > >>> into memory and set up a web service to consult it online? > > >> > Should I > > >> > > > pre > > >> > > > > >>> process these recommendations for each recent user and > > >>store it > > >> > > > > >>> somewhere? My first idea was storing all these recs back > > >>into > > >> > > > > Cassandra, > > >> > > > > >>> but looking into some classes it seems to me that the norm > > >>is > > >> to > > >> > > read > > >> > > > > >>> the input data and store the output always using files. Is > > >> this a > > >> > > > > common > > >> > > > > >>> practice that benefits from HDFS? > > >> > > > > >>> My use case here is something around 70k recommendations > > >> requests > > >> > > per > > >> > > > > >>> second. > > >> > > > > >>> > > >> > > > > >>> Thanks in advance, > > >> > > > > >>> > > >> > > > > >>> -- > > >> > > > > >>> > > >> > > > > >>> Atenciosamente > > >> > > > > >>> Helder Martins > > >> > > > > >>> Arquitetura do Portal e Sistemas de Backend > > >> > > > > >>> +55 (51) 3284-4475 > > >> > > > > >>> Terra > > >> > > > > >>> > > >> > > > > >>> > > >> > > > > >>> Esta mensagem e seus anexos se dirigem exclusivamente ao > seu > > >> > > > > >> destinatário, > > >> > > > > >>> podem conter informação privilegiada ou confidencial e são > > >>de > > >> uso > > >> > > > > >> exclusivo > > >> > > > > >>> da pessoa ou entidade de destino. Se não for destinatário > > >>desta > > >> > > > > mensagem, > > >> > > > > >>> fica notificado de que a leitura, utilização, divulgação > > >>e/ou > > >> > cópia > > >> > > > sem > > >> > > > > >>> autorização pode estar proibida em virtude da legislação > > >> vigente. > > >> > > Se > > >> > > > > >>> recebeu esta mensagem por engano, pedimos que nos o > > >>comunique > > >> > > > > >> imediatamente > > >> > > > > >>> por esta mesma via e, em seguida, apague-a. > > >> > > > > >>> > > >> > > > > >>> Este mensaje y sus adjuntos se dirigen exclusivamente a su > > >> > > > > destinatario, > > >> > > > > >>> puede contener información privilegiada o confidencial y > es > > >> para > > >> > > uso > > >> > > > > >>> exclusivo de la persona o entidad de destino. Si no es > > >>usted él > > >> > > > > >>> destinatario indicado, queda notificado de que la lectura, > > >> > > > utilización, > > >> > > > > >>> divulgación y/o copia sin autorización puede estar > > >>prohibida en > > >> > > > virtud > > >> > > > > de > > >> > > > > >>> la legislación vigente. Si ha recibido este mensaje por > > >>error, > > >> le > > >> > > > > pedimos > > >> > > > > >>> que nos lo comunique inmediatamente por esta misma vía y > > >> proceda > > >> > a > > >> > > su > > >> > > > > >>> exclusión. > > >> > > > > >>> > > >> > > > > >>> The information contained in this transmissión is > privileged > > >> and > > >> > > > > >>> confidential information intended only for the use of the > > >> > > individual > > >> > > > or > > >> > > > > >>> entity named above. If the reader of this message is not > the > > >> > > intended > > >> > > > > >>> recipient, you are hereby notified that any dissemination, > > >> > > > distribution > > >> > > > > >> or > > >> > > > > >>> copying of this communication is strictly prohibited. If > you > > >> have > > >> > > > > >> received > > >> > > > > >>> this transmission in error, do not read it. Please > > >>immediately > > >> > > reply > > >> > > > to > > >> > > > > >> the > > >> > > > > >>> sender that you have received this communication in error > > >>and > > >> > then > > >> > > > > delete > > >> > > > > >>> it. > > >> > > > > >>> > > >> > > > > >> > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > BF Lyon > > >> > > > http://www.nowherenearithaca.com > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > >> -- > > >> Luis Carlos Guerrero Covo > > >> M.S. Computer Engineering > > >> (57) 3183542047 > > >> > > > > >