Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Lance Norskog
Also, if your original matrix is A, then it is usually a shorter path to results to analyze the word (item) cooccurrence matrix A'A. The methods below work either way. The cooccurrence definitions I'm finding only use the summation-based one in wikipedia. Are there any involving inverting the

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Lance Norskog
If you have a document (user) and a word (item), then you have a joint probability that any given interaction will be between this document and word. We pretend in this case that each interaction is independent of every other which is patently not true, but very helpful. So if you subsample

seq2sparse fails: org.apache.lucene.analysis.Analyzer not found

2011-08-31 Thread Andrea Leistra
I've seen a number of people reporting this problem on the list on the past few months, and while I've run seq2sparse successfully in the past (using 0.4), it is failing for me now with a ClassNotFoundException.I did a clean re-install of the 0.5 distribution yesterday since others had

Re: seq2sparse fails: org.apache.lucene.analysis.Analyzer not found

2011-08-31 Thread Sean Owen
Try HEAD from Subversion. I think it's been addressed, but that change of course would not have gone back and shown up in 0.5. On Wed, Aug 31, 2011 at 2:06 PM, Andrea Leistra andrea.leis...@concur.comwrote: I've seen a number of people reporting this problem on the list on the past few months,

Re: #mahout IRC

2011-08-31 Thread Grant Ingersoll
I'm on the fence. It's used extensively in Lucene. We also log it, or at least the dev channel. It's a boon for the full time devs out there. For the rest of us, I'm undecided. On the plus side, you get real time interaction. You also get some fun banter from time to time w/ friends. On

Re: #mahout IRC

2011-08-31 Thread Ted Dunning
IRC is also used heavily in the hbase community and people do use it there for install questions pretty commonly. Whether this works is critically dependent on having guys like Mike Stack who are able to keep an eye on IRC and answer the questions. My own preference is to use the mailing list

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Ted Dunning
Mathematically speaking, random sampling is just fine. Stratifying based on various criteria can help avoid loss of accuracy so if you had several clusters then down sampling heavily represented clusters might work, but the accurate definition of clusters is harder than the cooccurrence analysis

Re: #mahout IRC

2011-08-31 Thread Dave Stuart
Just a suggestion, if we had logging available could we mail out a digest of the days IRC chat to the mailing list. I know there could be a bunch of noise in there from banter etc, but it wouldn't take much to scan for anything important. We could get crazy and and run it through one of

Re: #mahout IRC

2011-08-31 Thread Daniel Lewis
I look to IRC for answers to simple questions, and I keep an eye on it to answer other peoples' simple questions. The Weka mailing list seems to have some of the same questions asked every week, and they have no official IRC channel. On Wed, Aug 31, 2011 at 11:00 AM, Ted Dunning

Re: Email and Collab. Filtering

2011-08-31 Thread Sean Owen
Is the problem not just a matter of translating from the original identifiers to ints, so they can be used as offsets into a vector, and then back again? The recommender stuff has a stage for this already which hashes identifiers to an int, stores the mapping, and un-does it at the end. In

Re: Email and Collab. Filtering

2011-08-31 Thread Grant Ingersoll
On Aug 31, 2011, at 11:26 AM, Sean Owen wrote: Is the problem not just a matter of translating from the original identifiers to ints, so they can be used as offsets into a vector, and then back again? Yeah, I was wondering about that when looking at the RecommenderJob. If I understand you

Re: Email and Collab. Filtering

2011-08-31 Thread Ted Dunning
My own recommendation here would be to run a word-count first and then create a dense dictionary using a sequential process. This sequential step should be very fast because the number of items is quite modest. I would create an additional dictionary at the same time for email addresses. Once

Re: Email and Collab. Filtering

2011-08-31 Thread Sean Owen
No it still wants user,item[,rating] input. But otherwise yes, it's translated and un-translated internally as needed. You could change the mapper to read that input easily though. it still wants numeric input. It's hashing longs to ints. But this could easily be changed to record a more general

Re: Email and Collab. Filtering

2011-08-31 Thread Grant Ingersoll
On Aug 31, 2011, at 11:43 AM, Ted Dunning wrote: My own recommendation here would be to run a word-count first and then create a dense dictionary using a sequential process. This sequential step should be very fast because the number of items is quite modest. I would create an additional

Re: Email and Collab. Filtering

2011-08-31 Thread Sean Owen
Yes, I'm suggesting that could at least be 80% of what you need. If you can generalize that bit further and refactor it, all the better. I wouldn't bother necessarily extending to support the user: item item item syntax or else we'd get into supporting a lot of stuff. That conversion IMHO can be

Re: Problems running examples

2011-08-31 Thread Dan Brickley
On 10 June 2011 18:34, Jeff Eastman jeast...@narus.com wrote: I'm still trying to figure out why reuters-0.5 does not work on either of my clusters. The scripts themselves have no diff and the environment variables are set as in trunk except for MAHOUT_HOME. The synthetic control and 20

RE: Problems running examples

2011-08-31 Thread Jeff Eastman
Hi Dan, No I never did. I got distracted doing something else and did not debug further. If you are still seeing this on trunk then we should (re)open a JIRA. Jeff -Original Message- From: danbri2...@danbri.org [mailto:danbri2...@danbri.org] On Behalf Of Dan Brickley Sent: Wednesday,

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Lance Norskog
In this text-only notation, I though apostrophe meant inverse. What then is matrix inversion? I see a fair amount of stuff here in what I think is MathML, but is displays raw in gmail. On Wed, Aug 31, 2011 at 8:04 AM, Ted Dunning ted.dunn...@gmail.com wrote: Uhh... A' is the transpose of A.

Re: #mahout IRC

2011-08-31 Thread Lance Norskog
The shoemaker's children go barefoot. In fact, some IRC - coherent whole workflow would be really interesting beyond this one problem. It intersects with the problem abstracting coherence out of movie subtitle tracks. And it would be totally salable to the US Dept of Homeland Fascism. On Wed,

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Dmitriy Lyubimov
we usually denote inverse , A^{-1} or just A^-1 Apostrophe, superscript star or {top} always mean transpose. I never saw apostrophe to be used for inverses. Perhaps confusion may be stemming from the fact that inverse equal transpose if matrices are orthogonal (or even orthonormal), so sometimes

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Ted Dunning
This is a very good point that seems very likely to be the source of the confusion. On Wed, Aug 31, 2011 at 6:06 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Perhaps confusion may be stemming from the fact that inverse equal transpose if matrices are orthogonal (or even orthonormal), so

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Ted Dunning
The basics of latex notation are ^ for superscript _ for subscript {} for grouping \sum for summation \log for logs \Omega for upper case greek letter omega \alpha for lower case greek letter beta \int for integral. See http://www.codecogs.com/latex/eqneditor.php for a playground where you

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Ken Krugler
On Aug 31, 2011, at 8:32pm, Ted Dunning wrote: The basics of latex notation are ^ for superscript _ for subscript {} for grouping \sum for summation \log for logs \Omega for upper case greek letter omega \alpha for lower case greek letter beta \int for integral. See

Re: Singular vectors of a recommendation Item-Item space

2011-08-31 Thread Ted Dunning
Hmm... I see this: http://latex.codecogs.com/gif.latex?F(x,y)=0%20~~\mbox{and}~~%20\left|%20\begin{array}{ccc}%20F''_{xx}%20%20F''_{xy}%20%20F'_x%20\\%20F''_{yx}%20%20F''_{yy}%20%20F'_y%20\\%20F'_x%20%20F'_y%20%200%20\end{array}\right|%20=%200 Must be a cut and paste kind of thing. On Wed, Aug