RE: Mahout beginner questions...

2012-04-05 Thread Razon, Oren
Ok, so here is the point I still not getting. The architecture we are talking about is to push heavy computation for offline work, for that I could utilize Hadoop part. Beside, having an online part, which will retrieve the recommendation from the pre-computed results or even will do some more c

Re: Mahout beginner questions...

2012-04-05 Thread Sebastian Schelter
Hi Oren, If you use an item-based approach, its sufficient to use the top-k similar items per item (with k somewhere between 25 and 100). That means the data to hold in memory is num_items * k data points. While this is a theoretical limitation, it should not be a problem in practical scenarios,

RE: Mahout beginner questions...

2012-04-05 Thread Razon, Oren
Thanks for the answer, but still... I will need to keep in memory the rating matrix so I will be able to utilize the ranking a user gave to items together with the item similarity. -Original Message- From: Sebastian Schelter [mailto:s...@apache.org] Sent: Thursday, April 05, 2012 10:34 T

Re: Mahout beginner questions...

2012-04-05 Thread Sebastian Schelter
You don't have to hold the rating matrix in memory. When computing recommendations for a user, fetch all his ratings from some datastore (database, key-value-store, memcache...) with a single query and use the item similarities that are held in-memory to compute the recommendations. --sebastian O

Re: Mahout beginner questions...

2012-04-05 Thread Sean Owen
It might or might not be interesting to comment on this discussion in light of the new product/project I mentioned last night, Myrrix. It's definitely an example of precisely this two-layered architecture we've been discussing on this thread. http://myrrix.com/design/ The nice thing about a matri

Re: Commercializing Mahout: the Myrrix recommender platform

2012-04-05 Thread Dan Brickley
On 5 April 2012 00:18, Jake Mannix wrote: > +1 to everything Ted said. > >  As an added point, while we're on the subject of corporate involvement, > forks, and extensions of Mahout, now is as good a time as any to announce > that I (and my teammate Andy Schlaikjer) are maintaining a official > "T

Re: Latent Semantic Analysis

2012-04-05 Thread Peyman Mohajerian
Hi Guys, I'm now using ssvd for my LSA code and get the following error, at the time of error all I have under 'SSVD-out' folder: Q-job/QHat-m-0

Mahout at Twitter

2012-04-05 Thread Jake Mannix
On Thu, Apr 5, 2012 at 1:28 AM, Dan Brickley wrote: > On 5 April 2012 00:18, Jake Mannix wrote: > > +1 to everything Ted said. > > > > As an added point, while we're on the subject of corporate involvement, > > forks, and extensions of Mahout, now is as good a time as any to announce > > that

Re: Latent Semantic Analysis

2012-04-05 Thread Dmitriy Lyubimov
Hm. i never saw that and not sure where this folder comes from. Which hadoop version are you using? This may be a result of incompatible support for multiple outputs in the newer hadoop versions . I tested it with CDH3u0/u3 and it was fine. This folder should normally appear in the conversation, i

Re: Latent Semantic Analysis

2012-04-05 Thread Dmitriy Lyubimov
Yeah. i don't see how it may have arrived at that error. Peyman, I need to know more -- it looks like you are using embedded api, not a command line, so i need to see how you you initialize the solver and also which version of Mahout libraries you are using (your stack trace numbers do not corre

Re: Latent Semantic Analysis

2012-04-05 Thread Peyman Mohajerian
Hi Dmitriy, It is a Clojure code from: https://github.com/algoriffic/lsa4solr Of course I modified it to use Mahout .6 distribution, also running on hadoop-0.20.205.0, here is the Closure code that I changed, the lines after ' decomposer (doto (.run ssvdSolver)) ' still need modification b/c I'm n

Re: Latent Semantic Analysis

2012-04-05 Thread Dmitriy Lyubimov
Any chance you could test it with its current dependency, 0.20.204? or that would be hard to stage? Newer hadoop version is frankly all i can think of here for the reason of this. On Thu, Apr 5, 2012 at 11:35 AM, Peyman Mohajerian wrote: > Hi Dmitriy, > > It is a Clojure code from: https://githu

Re: Latent Semantic Analysis

2012-04-05 Thread Dmitriy Lyubimov
also you are printing your input path -- how does it look like in reality? because this path that it complains about, SSVDOutput/data, in fact should be the input path. That's what's perplexing. We are talking hadoop job setup process here, nothing specific to the solution itself. And job setup/di

Re: Latent Semantic Analysis

2012-04-05 Thread Dmitriy Lyubimov
Another idea i have is to try to run it from just Mahout command line, see if it works with .205. If it does, it is definitely something about passing parameters in/client hadoop classpath/ etc. On Thu, Apr 5, 2012 at 11:51 AM, Dmitriy Lyubimov wrote: > also you are printing your input path -- ho

Re: Latent Semantic Analysis

2012-04-05 Thread Peyman Mohajerian
Ok, great, I'll give these ideas a try later today, the input is the following line(s) that in my code sample was commented out using ';' in Clojure. The first stage, Q-job is done fine, it is the second job that gets messed up, the output of Q-job is at: /lsa4solr/matrix/14099700861483/transpose-

Re: Latent Semantic Analysis

2012-04-05 Thread Dmitriy Lyubimov
In fact, Q-Job and Bt-Job have identical input ( of the A matrix) and identical setup of such input but for some reason Bt-job fails to see it. And it fails to see it in a very strange way. That's what perplexing. Bt job uses output of Q-job as a side info, not as main input. But the error (split

Re: Latent Semantic Analysis

2012-04-05 Thread Dmitriy Lyubimov
also i notice that you are using output as a subfolder of your input? if so, it is probably going to create some mess. If so, please don't use folders for input and output spec which are nested w.r.t. each other. This is not expected. -d On Thu, Apr 5, 2012 at 12:00 PM, Peyman Mohajerian wrote:

Combining CF and Content-based recommendations

2012-04-05 Thread ameh
Hi, I'm a new user to Mahout and am currently using the 1m ratings MovieLens data file. I had a couple of questions: #1. /ItemBased using CF and content-based/ I've been thinking of a way to incorporate content-based recommendations (using additional attributes of an item, in this case a movie)

Re: Combining CF and Content-based recommendations

2012-04-05 Thread Ahmed Abdeen Hamed
Hi Anita, I had a similar question to the list not too long ago. I got very good answers from both Sean and Ted. Please check the archives and if you still have questions feel free to email me. I am sure I will learn something new. Good luck! -Ahmed On Thu, Apr 5, 2012 at 4:20 PM, ameh wrote:

Re: Combining CF & content-based recommendations

2012-04-05 Thread Sean Owen
(Hmm, I don't know why it doesn't post to the mailing list. We get a message about moderating everything. I'll copy it to the list now.) In #1, you describe the usual user-item preference matrix. Yes it's sparse. I guess you could make up pseudo-items like genre in the matrix, yes, if you had expl