Re: Setting up a recommender

2013-07-22 Thread Pat Ferrel
+10 Love the academics but I agree with this. Recently saw a VP from Netflix plead with the audience (mostly academics) to move past RMSE--focus on maximizing correct ranking, not rating prediction. Anyway I have a pipeline that does the following: ingests logs either TSV or CSV of arbitrary c

Re: Setting up a recommender

2013-07-22 Thread Michael Sokolov
On 07/22/2013 12:20 PM, Pat Ferrel wrote: My understanding of the Solr proposal puts B's row similarity matrix in a vector per item. That means each row is turned into "terms" = external IDs--not sure how the weights of each term are encoded. This is the key question for me. The best idea I've

Re: Setting up a recommender

2013-07-22 Thread Gokhan Capan
Just to make sure if I understood correctly, Ted, could you please correct me?:) 1. Using a search engine, I will treat items as documents, where each document vector consists of other items (similar to "words of documents") with co-occurrence (LLR) weights (instead of tf-idf in a search engine a

Re: Setting up a recommender

2013-07-22 Thread Ted Dunning
My experience is that TFIDF works just fine, especially as first cut. Adding different kinds of data, building out backend A/B testing, tuning the UI, weighting the query all come the next round of weighting changes. Typically, the priority stack never empties enough for that task to rise to the

Re: Setting up a recommender

2013-07-22 Thread Ted Dunning
Inline ... slightly redundant relative to other answers, but that shouldn't be a problem. On Mon, Jul 22, 2013 at 11:56 AM, Gokhan Capan wrote: > Just to make sure if I understood correctly, Ted, could you please correct > me?:) > > > 1. Using a search engine, I will treat items as documents, w

Re: Setting up a recommender

2013-07-22 Thread Ted Dunning
On Mon, Jul 22, 2013 at 9:20 AM, Pat Ferrel wrote: > +10 > > Love the academics but I agree with this. Recently saw a VP from Netflix > plead with the audience (mostly academics) to move past RMSE--focus on > maximizing correct ranking, not rating prediction. > > Anyway I have a pipeline that doe

Re: Setting up a recommender

2013-07-22 Thread Michael Sokolov
So you are proposing just grabbing the top N scoring related items and indexing listing them without regard to weight? Effectively quantizing the weights to = 1, and 0 for everything else? I guess LLR tends to do that anyway -Mike On 07/22/2013 02:57 PM, Ted Dunning wrote: My experience is

Re: Setting up a recommender

2013-07-22 Thread Pat Ferrel
inline BTW if there is an LLR cross-similarity job (replacing [B'A] it is easy to integrate. On Jul 22, 2013, at 12:09 PM, Ted Dunning wrote: On Mon, Jul 22, 2013 at 9:20 AM, Pat Ferrel wrote: > +10 > > Love the academics but I agree with this. Recently saw a VP from Netflix > plead with t

Re: Setting up a recommender

2013-07-22 Thread Ted Dunning
On Mon, Jul 22, 2013 at 12:40 PM, Pat Ferrel wrote: > Yes. And the combined recommender would query on both at the same time. > > Pat-- doesn't it need ensemble type weighting for each recommender > component? Probably a wishlist item for later? Yes. Weighting different fields differently is

Re: Setting up a recommender

2013-07-22 Thread Ted Dunning
Not entirely without regard to weight. Just without regard to designing weights specific to this application. The weights that Solr uses natively are intuitively what we want (rare indicators have higher weights in a log-ish kind of way). Frankly, I doubt the effectiveness here of mathematical r

Re: Setting up a recommender

2013-07-22 Thread Michael Sokolov
Fair enough - thanks for clarifying. I wondered whether that would be worth the trouble, also. Maybe one the academics Pat mentioned will test and find out for us :) On 7/22/13 6:45 PM, Ted Dunning wrote: Not entirely without regard to weight. Just without regard to designing weights spec

"Recent behavior as a query" (from Setting up a recommender thread)

2013-07-22 Thread B Lyon
This is pulled out of one of Ted's inline responses to the recent Setting up a recommender thread, and was hoping to confirm some things... Most of which may end up being a restatement of what he and others have said in the first place. It seems that you would have a "document" in solr for each th

Re: "Recent behavior as a query" (from Setting up a recommender thread)

2013-07-22 Thread Ted Dunning
Exactly what I was trying to say. Excellently clear way to put it all. On Mon, Jul 22, 2013 at 8:38 PM, B Lyon wrote: > This is pulled out of one of Ted's inline responses to the recent Setting > up a recommender thread, and was hoping to confirm some things... Most of > which may end up being

Re: "Recent behavior as a query" (from Setting up a recommender thread)

2013-07-22 Thread Ted Dunning
Could I ask everybody who had trouble with my prose help me out by commenting on the design document? That way I can record the improvements that would make it clear. My apologies for only allowing commenting, but I find it easier to make sure all comments get in because there is a very nice trac

multinomial in mahout sgd

2013-07-22 Thread Shantha Kumar N01
Hi, in Mahout examples, the (org.apache.mahout.classifier.sgd.)RunLogistic or Trainlogistic class is a great example to classify content with SGD and to get a nice confusion matrix. I'm trying to use and adapt this to classify data in more than 2 categories. The alorithm uses classify scalar met

Re: multinomial in mahout sgd

2013-07-22 Thread Ted Dunning
Classify is the call that you want. The command line logistic regression programs were originally written more as demonstrations and weren't written to handle multiple target values. It shouldn't be hard to adapt them. It would be great to get a patch to do so. On Mon, Jul 22, 2013 at 9:08 PM

RE: multinomial in mahout sgd

2013-07-22 Thread Shantha Kumar N01
Hi Ted, Thanks . Can you please tell me the class which one I have to use for multinomial (Like AdaptiveLogisticRegression and OnLineLogistic Regression). How do I give target categories value to this class.is it from constructor.If you give small code snippet ,that will be helpful for me. -

Re: multinomial in mahout sgd

2013-07-22 Thread Ted Dunning
OLR supports a train method with the target being an integer. That allows multi-class training. I can't remember if ALR does as well. It may not since AUC is used to select hyper parameters and AUC is not uniquely defined for multiple classes. Calling the classifyFull method is the easiest way