Re: Setting up a recommender

2013-07-21 Thread Stevo Slavić
I see Ted created JIRA ticket for this already: https://issues.apache.org/jira/browse/MAHOUT-1288 We should consider changing issue type (currently - bug). One might find this Berlin Buzzwords 2013 recordingand slides

Re: Setting up a recommender

2013-07-21 Thread Iker Huerga
Hi, First of all, Ted, very inspiring video, I really enjoyed the concept of cross-occurrences. Secondly, I'd be very interested in collaborating on this project and here is why. I've been recently working for my employer on a very similar project that is currently deployed into our production en

Paper on Mahout's ALS implementation accepted at RecSys'13

2013-07-21 Thread Sebastian Schelter
I'm happy to anounce that a paper called "Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins" written by me and my colleagues at TU Berlin has been accepted for publication at the ACM Conference on Recommender Systems 2013. The paper discusses Mahout's latest distrib

Re: Paper on Mahout's ALS implementation accepted at RecSys'13

2013-07-21 Thread Suneel Marthi
Congrats again Sebastian. Sent from my iPhone On Jul 21, 2013, at 2:22 PM, Sebastian Schelter wrote: > I'm happy to anounce that a paper called "Distributed Matrix > Factorization with MapReduce using a series of Broadcast-Joins" written > by me and my colleagues at TU Berlin has been accepted

Re: Paper on Mahout's ALS implementation accepted at RecSys'13

2013-07-21 Thread Gokhan Capan
Congratulations, Sebastian! Gokhan On Sun, Jul 21, 2013 at 10:21 PM, Suneel Marthi wrote: > Congrats again Sebastian. > > Sent from my iPhone > > On Jul 21, 2013, at 2:22 PM, Sebastian Schelter wrote: > > > I'm happy to anounce that a paper called "Distributed Matrix > > Factorization with Map

Re: Setting up a recommender

2013-07-21 Thread Pat Ferrel
Read the paper, and the preso. As to the 'offline to Solr' part. It sounds like you are suggesting an item item similarity matrix be stored and indexed in Solr. One would have to create the action matrix from user profile data (preference history), do a rowsimiarity job on it (using LLR similar

Re: Setting up a recommender

2013-07-21 Thread B Lyon
Paper and presentation are very interesting to me as well. I am fairly new to this, and coming to terms with some of the terms, etc. I assume that "action matrix" here is just the raw matrix of how each user has "interacted with" the items/types-of-items. I didn't quite get the incorporation int

Re: Setting up a recommender

2013-07-21 Thread Ted Dunning
Pat, Yes. The first part probably just is the RowSimilarity job, especially after Sebastian puts in the down-sampling. The new part is exactly as you say, storing the DRM into Solr indexes. There is no reason to not use a real data set. There is a strong reason to use a synthetic dataset, howe

Re: Setting up a recommender

2013-07-21 Thread Ted Dunning
Inline On Sun, Jul 21, 2013 at 3:31 PM, B Lyon wrote: > Paper and presentation are very interesting to me as well. I am fairly new > to this, and coming to terms with some of the terms, etc. I assume that > "action matrix" here is just the raw matrix of how each user has > "interacted with" t

Re: Setting up a recommender

2013-07-21 Thread Ted Dunning
On Sun, Jul 21, 2013 at 8:10 AM, Iker Huerga wrote: > I think a conference call, maybe a hangout, to kick off the project would > be useful, who should schedule it? > I will shortly do that. I think that I will need more than one kickoff to deal with timezones. I will coordinate these ahead of

Re: Setting up a recommender

2013-07-21 Thread Pat Ferrel
RowSimilarity downsampling? Are you referring to the a mod of the matrix multiply to do cross similarity with LLR for the cross recommendations? So similarity of rows of B with rows of A? Sounds like you are proposing not only putting a recommender in Solr but also a cross-recommender? This is

Re: Setting up a recommender

2013-07-21 Thread Ted Dunning
The row similarity downsampling is just a matter of dropping elements at random from rows that have more data than we want. If the join that puts the row together can handle two kinds of input, then RowSimilarity can be easily modified to be CrossRowSimilarity. Likewise, if we have two DRM's with

Re: Setting up a recommender

2013-07-21 Thread Sebastian Schelter
At the moment, the down sampling is done by PreparePreferenceMatrixJob for the collaborative filtering functionality. We just want to move it down to RowSimilarityJob to enable standalone usage. I think that the CrossRecommender should be the next thing on our agenda, after we have the deployment

Part 2 blog post on extracting text features

2013-07-21 Thread Ken Krugler
Hi Mahouters, I just posted part 2 of a series on extracting text features for machine learning… http://www.scaleunlimited.com/2013/07/21/text-feature-selection-for-machine-learning-part-2/ The top five terms (by LLR score) in emails written by Ted are now u_k, v_k, sgd, regress, and categori.