Re: Number of Reducers in PFP Growth is always 1 !!!

2012-08-30 Thread Sean Owen
Block size and input size should not matter for the Reducer. You do have to explicitly say the number of workers. It defaults to 1. You do set it with just these methods. Make sure you are setting on the right object and before you run. Look for other things that may be overriding it. I don't kno

Re: Clustering users

2010-05-11 Thread Sean Owen
I believe those jobs will internally create whatever they need along the way, including user vectors if needed. To just create them by themselves, you could run ToItemPrefsMapper and ToUserVectorReducer from org.apache.mahout.cf.taste.hadoop.item. On Tue, May 11, 2010 at 5:51 AM, First Qaxy wro

Re: RecommenderJob output

2010-05-11 Thread Sean Owen
The values are entries in the final recommendation vector. They don't have a good interpretation by themselves, but larger values should mean better recommendation. So the recommendations are ordered by this value. It's included just in case it is useful. In other recommender systems (like .pseudo)

Re: RecommenderJob output

2010-05-11 Thread Sean Owen
Er, wait why are you setting booleanData = false? Though the formatting got messed up here, it looks like you do not have explicit ratings. So you should set to true.. On Tue, May 11, 2010 at 7:11 AM, First Qaxy wrote: > Hello, > When running the RecommenderJob with --booleanData false on this >

Re: RecommenderJob output

2010-05-11 Thread Sean Owen
I just committed more of my local changes, since I'm actively improving and fixing things here. My output looks more reasonable: 101 [1015:4.0,1021:3.0,1020:3.0] 102 [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0] 103 [1005:12.0,1021:3.0,1015:3.0,1020:3.0] 105 [1005:14.0,1021:3.0,

Re: RecommenderJob output

2010-05-11 Thread Sean Owen
(Did that happen? I only see my three replies to the original message -- sure, maybe that could have been one -- but all were directly relevant to the first message.) (Or is this somehow looking connected to another thread because it shares the same subject? didn't happen for me in Gmail at least)

Re: Running Demo in Eclipse

2010-05-11 Thread Sean Owen
I've never seen this before. As it says, you can run with -X to see more output, maybe that helps. I suspect this is a Maven problem, since the error comes from within Maven, and is something to do with 'modelEncoding' and com.thoughtworks.xstream.mapper.CannotResolveClassException which are not M

Re: Several questions about Mahout

2010-05-11 Thread Sean Owen
On Tue, May 11, 2010 at 2:39 PM, Robin Anil wrote: >> 1/ How active is Mahout development? The last commit is very recent if I'm >> not mistaken. Can we expect improvements/new features in the future? Very active. I myself committed twice today. There are a number of people committing regularly.

Re: org.apache.mahout.math.Varint missing

2010-05-11 Thread Sean Owen
Dang, my fault for crossing changelists. Let me fix that now. On Tue, May 11, 2010 at 3:27 PM, First Qaxy wrote: > Sean, > Thanks for your updates. I'll try these today. > Regarding org.apache.mahout.math.Varint which is being referenced > from org.apache.mahout.cf.taste.hadoop.item.IndexIndexWr

Re: org.apache.mahout.math.Varint missing

2010-05-11 Thread Sean Owen
Done. That's a class I have locally. I shelved my changelist with it so I don't make that error again. On Tue, May 11, 2010 at 3:32 PM, Sean Owen wrote: > Dang, my fault for crossing changelists. Let me fix that now. > > On Tue, May 11, 2010 at 3:27 PM, First Qaxy wrote: >

Re: RecommenderJob output

2010-05-11 Thread Sean Owen
; the internal model and regenerate only recommendations for the user that I'm > interested in? > > Thanks. > -qf > --- On Tue, 5/11/10, Sean Owen wrote: > > From: Sean Owen > Subject: Re: RecommenderJob output > To: user@mahout.apache.org > Cc: mahout-u...@lucene.apache

Anybody following changes to distributed recommender?

2010-05-17 Thread Sean Owen
I'm still in the middle of continuing to change the distributed recommender significantly. I thought I'd ask if anyone has specific concerns about discussing changes before I make them -- for example, I'd like to get rid of the combiner in the co-occurrence counter mapper since it's taking more pro

Re: IDMigrator

2010-05-19 Thread Sean Owen
Ideally, you initialize by pre-loading all the mappings, by calling initialize(). You can also call storeMapping() whenever you know you have a new mapping -- on each translation if you like, though that's a lot of overhead. Then you just use it to translate strings to numbers and back. The only t

Re: IDMigrator

2010-05-20 Thread Sean Owen
;m missing is a way to retrieve the > string names of items through the web service. Are there any hooks on > that side where I can make the ID translation? > > Thanks again! > > Matt > > On Wed, May 19, 2010 at 4:33 AM, Sean Owen wrote: >> Ideally, you initialize by pr

Re: Google Big Query & Prediction API

2010-05-20 Thread Sean Owen
It looks like a big classifier. I'm guessing it's going to be pretty great. It's such a natural fit with their infrastructure. Building out these underlying map-reduce implementations is great, but in the end the useful thing is to wrap that up in an API that is oriented towards a particular appli

Re: HMM and Ngrams

2010-05-21 Thread Sean Owen
This doesn't unsubscribe. Send a message to: user-unsubscr...@mahout.apache.org On Fri, May 21, 2010 at 2:56 PM, marshall wrote: > unsubscribe

Re: Mahout LDA Parameter: maxIter

2010-05-23 Thread Sean Owen
Is there a way to catch that with a more descriptive error earlier? I always think AIOOBE looks bad. On May 23, 2010 4:11 PM, "Jeff Eastman" wrote: Yes, your -numWords option is set too low and that's causing the array exception. Try -v 5. On 5/23/10 3:20 AM, 杨杰 wrote: > > Jeff and Robin,

Re: Mahout LDA Parameter: maxIter

2010-05-23 Thread Sean Owen
Even something as simple as checking that bound and throwing IllegalStateException with a custom message -- yeah I imagine it's hard to detect this anytime earlier. Just a thought. On Sun, May 23, 2010 at 6:29 PM, Jeff Eastman wrote: > I agree it is not very friendly. Impossible to tell the corre

Crude distributed recommender performance / cost stats

2010-05-26 Thread Sean Owen
Hi all, though the list might be interested in some recent numbers I collected on distributed recommenders, in reality, on Hadoop. I just finished running a set of recommendations based on the Wikipedia link graph, for book purposes (yeah, it's unconventional). I ran on my laptop, but it ought to b

Re: Crude distributed recommender performance / cost stats

2010-05-26 Thread Sean Owen
} } } On Wed, May 26, 2010 at 4:45 PM, Jake Mannix wrote: > Hey Sean, > >  Very cool!  Is there any custom code you used to import the link data / > instructions on how to reproduce this? > >  -jake > > On May 26, 2010 8:09 AM, "Sean Owen" wrote: > >

Re: Change Recommender servlet

2010-05-26 Thread Sean Owen
This bit of code actually depends on other code in the examples module. (Which is arguably strange and which I'm happy to think about changing, but there are decent reasons for it.) I think it is just depending on the RecommenderWrapper class. It should build from the Maven build script since it

Re: Change Recommender servlet

2010-05-27 Thread Sean Owen
recommenderServlet class > can't "see" the package that I import. Which packages are viewable from the > servlet > and how can I add the examples package? > > Thanks, > Lef > >> -Original Message- >> From: Sean Owen [mailto:sro...@gmail.

Re: Crude distributed recommender performance / cost stats

2010-05-27 Thread Sean Owen
wrote: > Great stuff, Sean!  Code Sounds like it could go into examples with some of > the other Wikipedia stuff? > > Also, how about c-n-p to > https://cwiki.apache.org/confluence/display/MAHOUT/MahoutBenchmarks? > > -Grant > > On May 26, 2010, at 1:32 PM, Sean Owen wro

Re: Change Recommender servlet

2010-05-27 Thread Sean Owen
I was mistaken earlier when I said taste-web depends on examples. It doesn't, which is good. So, am I right that you depend on examples because you have explicitly modified RecommenderServlet to refer to, say, the GroupLens-based code in mahout-examples? That's fine, but yes it means you have to e

Re: --input now -Dmapred.input.dir ?

2010-05-27 Thread Sean Owen
That's right, and --output is -Dmapred.output.dir. This was just an attempt to use the regular Hadoop args where possible. On Fri, May 28, 2010 at 2:04 AM, Jake Mannix wrote: > Is that right?  I think the mahout shell script is broken for a lot of > AbstractJob subclasses now... (TransposeJob, Te

Re: --input now -Dmapred.input.dir ?

2010-05-27 Thread Sean Owen
Note that those -D args would all have to come first. On May 28, 2010 5:28 AM, "Drew Farris" wrote: It is kind of strange that AbstractJob is barfing on the -Dkey=value arguments -- what's the command-line you're attempting to use Jake? On Thu, May 27, 2010 at 9:04 PM, Jake Mannix wrote: > I

Re: RE: Change Recommender servlet

2010-05-27 Thread Sean Owen
- C:\Users\lef\Desktop\mahout-0.3-src\mahout-0.3\taste-web> -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent: Thursday, May 27, 2010 8:24 PM To: user@mahout.apache.org Subject: Re: Change Recommender serv... I was mistaken earlier when I said taste-web depends on examples. It doesn't, which is good. So, am...

Re: --input now -Dmapred.input.dir ?

2010-05-28 Thread Sean Owen
Does it help to note this is Hadoop's flag? It seemed more standard therefore, possibly more intuitive for some already using Hadoop. We were starting to reinvent many flags this way so seemed better to not thunk them with no gain On May 28, 2010 6:06 AM, "Grant Ingersoll" wrote: I just saw tha

Re: RE: RE: Change Recommender servlet

2010-05-28 Thread Sean Owen
following the paradigm of package cf.taste.example.grouplens. But even when I try to import the package cf.taste.example.grouplens, I get the same compilation error. Something is going on with the servlet dependencies.. -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent: Fr

Understanding the SVD recommender

2010-06-03 Thread Sean Owen
I'm finally looking to make my crude understanding of SVD-based recommenders more like "vague". I understand the SVD and the principle here but haven't implemented it. I'm looking at what I believe is the original paper on it by Sarwar et al: www.grouplens.org/papers/pdf/sarwar_SVD.pdf It's good

Re: Understanding the SVD recommender

2010-06-03 Thread Sean Owen
cts neither of those. - Conceptually I would understand Nu x VTk, but then P is defined by an additional product with Uk In short... what? On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning wrote: > Fire away. > > On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen wrote: > >> Is anyone out there f

Re: Understanding the SVD recommender

2010-06-04 Thread Sean Owen
compute new user > vectors at any time by multiplying the new users' ratings by V. > > The diagram in figure one is hideously confusing because it looks like a > picture of some kind of multiplication whereas it is really depicting some > odd kind of flow diagram. > > Does this solve the problem? > > On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen wrote:

Re: Understanding the SVD recommender

2010-06-04 Thread Sean Owen
On Fri, Jun 4, 2010 at 8:53 AM, Ted Dunning wrote:on of the vectors, but isn't strictly necessary. > > Note that since this is an SVD, S is diagonal and all elements are real (and > positive, actually).  Thus B* = B. (Yeah either way it's just dotting stuff with that "diagonal vector", or really

Re: Installing Mahout

2010-06-07 Thread Sean Owen
This really isn't enough information to provide any reasonable response. What is the problem? On Fri, Jun 4, 2010 at 4:29 PM, tammuz wrote: > > Hello, > > I am trying to install Mahout on my 2 machines (windows, Linux), I already > tried using mahout wiki but that did not succeeded, can any one h

Re: Big Longs in RecommenderJob

2010-06-07 Thread Sean Owen
Yeah the problem is that signed values are zig-zag encoded into an unsigned value, which loses 1 bit, in addition to losing another bit by mapping to unsigned values. Still there is definitely a way to make it work; the encoding is certainly defined for larger values and there is a need for it. I

Re: Big Longs in RecommenderJob

2010-06-07 Thread Sean Owen
increase encoding efficiency a little. On Tue, Jun 8, 2010 at 12:36 AM, Ted Dunning wrote: > The other solution would be to be satisfied with 62 bits of id space and > only generate "small" longs. > > On Mon, Jun 7, 2010 at 3:39 PM, Sean Owen wrote: > >> Yeah the pr

Re: Big Longs in RecommenderJob

2010-06-08 Thread Sean Owen
ned when asked to write unsigned and all is well. Obvious right? On Tue, Jun 8, 2010 at 1:46 AM, Sean Owen wrote: > Really, the mistake here (is mine and) is writing these IDs as signed > values. As used in the recommender bit, the IDs are already > nonnegative longs and so can be writte

Re: Big Longs in RecommenderJob

2010-06-08 Thread Sean Owen
nt in raw? If so, would this be any different? return temp ^ (raw & (1<<63)); On Tue, Jun 8, 2010 at 4:42 AM, Sean Owen wrote: > public static long readSig...

Re: Generating a Document Similarity Matrix

2010-06-08 Thread Sean Owen
Sort of, there is a separate job to compute all item-item similarities under a variety of metrics. This is what Sebastian wrote. It's not used in the co-occurrence recommender (but could be -- vaguely a to-do here.) But sure if you're willing to think of a doc as an "item vector" of "preferences"

Re: Generating a Document Similarity Matrix

2010-06-09 Thread Sean Owen
Well I'm not sure they're unique, they're just vectors. Would that not be the best neutral representation for things like this? What was the comment about keying by ints vs longs earlier? If unifying that helps bring things closer together I can look at it, if I can understand the issue. On Wed,

Re: Generating a Document Similarity Matrix

2010-06-09 Thread Sean Owen
On Wed, Jun 9, 2010 at 7:14 PM, Jake Mannix wrote: > The ItemSimilarityJob actually uses implementations of the Vector > class hierarchy?  I think that's the issue - if the on-disk and in-mapper > representations are never Vectors, then they won't interoperate with > any of the matrix operations..

Re: Generating a Document Similarity Matrix

2010-06-09 Thread Sean Owen
't be used? On Wed, Jun 9, 2010 at 7:33 PM, Jake Mannix wrote: > On Wed, Jun 9, 2010 at 11:25 AM, Sean Owen wrote: > >> On Wed, Jun 9, 2010 at 7:14 PM, Jake Mannix wrote: >> > The ItemSimilarityJob actually uses implementations of the Vector >> > class hierarch

Re: Installing Mahout

2010-06-10 Thread Sean Owen
... what folders? this is the kind of specifics that are needed. I don't know of any doc problems. On Wed, Jun 9, 2010 at 1:04 PM, tammuz wrote: > > you might be right Sean and I am sorry for that, I am interested in > recommendation systems and I read about "Taste" which is a Mahout > applicatio

Re: SVD algorithm

2010-06-11 Thread Sean Owen
I have not tried it myself. In principle I do not see a reason you couldn't send in vectors with 0 or 1 values only. You would have to evaluate the result -- would be interesting to hear your results. However I'll also say I don't imagine this is the most efficient recommender for this kind of dat

Re: Recommendations on binary ratings

2010-06-11 Thread Sean Owen
I would map all ratings, of all values, to a 1. Practically speaking this is probably more 'accurate'. Are you using a precision-recall test? they're not terribly informative, though they're about the only thing you can do to evaluate recommendations without ratings. That is, it's testing whether

Re: Recommendations on binary ratings

2010-06-11 Thread Sean Owen
What I mean is that 'boolean' ratings express an association, versus no association. Any item a user has bothered to rate is significantly more associated to that user -- even if they hated it -- than the universe of other items that the user has never even heard of. If you're a classical music fa

Re: SVD algorithm

2010-06-11 Thread Sean Owen
Depending on your data, I'd imagine simple item-based recommenders will be faster, that's all. The SVD takes some time to compute. On Fri, Jun 11, 2010 at 3:28 PM, Nishant Chandra wrote: > AFAIK, SVD is to overcome sparsity in the user - item matrix. How is it > connected to efficiency here? Do y

Re: Installing Mahout

2010-06-11 Thread Sean Owen
On Fri, Jun 11, 2010 at 2:06 PM, tammuz wrote: > So I downloaded Mahout from this link: > http://www.apache.org/dyn/closer.cgi/lucene/mahout/ > as you can see there are 3+2 folders (0.1, 0.2, 0.3) and > (mahout-collection-codegen-plugin-1.0, mahout-collections-1.0), I assume > that (0.1, 0.2, 0.3)

Re: Mahout on EMR?

2010-06-11 Thread Sean Owen
Yes we're about half migrated to the new APIs now and so should focus on completing that. On Fri, Jun 11, 2010 at 4:07 PM, Isabel Drost wrote: > Might be worth having a look again - seems like EMR just got upgraded > to support Hadoop 0.20: > > http://developer.amazonwebservices.com/connect/entry

Re: Installing Mahout

2010-06-11 Thread Sean Owen
That's the issue then. Do you have more information about the error? usually it will say where it dumped log files. are you using Windows? FWIW 0.3 builds fine for me on my machine. On Fri, Jun 11, 2010 at 4:45 PM, tammuz wrote: > > returning back I find that when executing the 5th step i got th

Re: Setting Number of Mappers and Reducers in DistributedRowMatrix Jobs

2010-06-11 Thread Sean Owen
-Dmapred.map.tasks and same for reduce? These should be Hadoop params you set directly to Hadoop. On Fri, Jun 11, 2010 at 5:07 PM, Kris Jack wrote: > Hi everyone, > > I am running code that uses some of the jobs defined in the > DistributedRowMatrix class and would like to know if I can define th

Re: Installing Mahout

2010-06-11 Thread Sean Owen
I don't think that's the conclusion. The big problem is most certainly that it doesn't build -- of course the rest doesn't work if the build doesn't. 0.3 builds for me, and I assume for all of us since we tested it before releasing it. I think the next step is to gather more information from your

Re: My own recommender

2010-06-11 Thread Sean Owen
You don't need a new Maven project, no. You *could* if you wanted to, and depend on Mahout artifacts in your project. You could write a module within Mahout's project if that's more convenient. If you're just playing around, just stick files somewhere in the core/ module, wherever you like. That w

Re: Setting Number of Mappers and Reducers in DistributedRowMatrix Jobs

2010-06-14 Thread Sean Owen
ters that you suggested. > > Please do let me know if I was just not calling them correctly or if you > think that there already exists an alternative way to do this.  I would like > to use Mahout as it was intended and not make lots of little changes myself > if they aren't necessar

Re: Setting Number of Mappers and Reducers in DistributedRowMatrix Jobs

2010-06-14 Thread Sean Owen
Looks right to me. My next question is are you calling getConf() to get Hadoop's configuration object rather than configuring and setting your own? if you did that, you'd lose anything Hadoop parsed from its files and command line -- but would explain why re-setting it yourself in the code works.

Re: Setting Number of Mappers and Reducers in DistributedRowMatrix Jobs

2010-06-15 Thread Sean Owen
The first part looks fine. I dug in, and see that the transpose() method ultimately does not use the configuration that was configured and makes its own. That is the underlying issue. Maybe Jake can comment more. On Tue, Jun 15, 2010 at 10:28 AM, Kris Jack wrote: > Hi Sean, > > I'm calling getCo

Re: Predicting Successor Item

2010-06-15 Thread Sean Owen
I would strongly guess that it's the very last item purchased that makes the most difference to the next item purchased. So it's probably a fairly simple problem -- just look at chains of length 1. Count for each item i, which item j came next. Then just return the highest-count one. You could thr

Re: Installing Mahout

2010-06-17 Thread Sean Owen
Yes, as was mentioned, the problem is that it is not building on your particular machine for some reason. The build process outputs many logs that show what happened. I'm sure you will find evidence of the problem there. Without that information, I don't think anyone could help more as it works on

Re: Installing Mahout

2010-06-17 Thread Sean Owen
Yes, again, it's clear the tests are failing because the *build* failed earlier. This log output just says the tests later failed. What we need is log information from the failed *build*, not tests. On Thu, Jun 17, 2010 at 1:09 PM, tammuz wrote: > > Well this is what I note during the installatio

Re: Recommendations on binary ratings

2010-06-17 Thread Sean Owen
Pranay you already sent this message to the mailing list. The forwarded message below to mahout-user-subscribe does not work (and is out of date). You need to view the replies to your first message already on this list. On Thu, Jun 17, 2010 at 1:13 PM, pranay venkata wrote: > -- Forwarded

Re: Installing Mahout

2010-06-17 Thread Sean Owen
Yes we had previously established that the tests are failing because the build failed. To be clear the next step is to retrieve Maven's log files regarding the failed build, instead of the tests themselves. On Thu, Jun 17, 2010 at 1:21 PM, Isabel Drost wrote: > On Thu tammuz wrote: >> Well this

Re: Installing Mahout

2010-06-18 Thread Sean Owen
This does look like something else. This is a Locale problem I think we fixed in SVN a while ago. So you can try the latest code from SVN instead (always a good idea here). Though I don't think is related to the build failure you were seeing separately. On Fri, Jun 18, 2010 at 10:04 AM, tammuz w

Re: out for memory

2010-06-18 Thread Sean Owen
How big is your input? I am not sure this necessarily scales to tens of millions of data points, no. A distributed implementation is being created which could be more appropriate. On Fri, Jun 18, 2010 at 11:33 AM, Tamas Jambor wrote: > hi, > > i am trying to run an SVD recommender with the netfli

Re: out for memory

2010-06-18 Thread Sean Owen
Memory requirements may be much higher for this algorithm as it builds large intermediate data structures to compute the SVD. Yes I think the simple data fits in 3GB or so. Sounds like you have solved your problem by supplying more memory. On Fri, Jun 18, 2010 at 2:10 PM, Tamas Jambor wrote: > it

Re: out for memory

2010-06-18 Thread Sean Owen
as able to fit in 3GB memory. I guess > GenericDataModel takes up quite a lot of memory, because the data is indexed > by users and by items, which is not necessary for SVD > > On 18/06/2010 14:21, Sean Owen wrote: >> >> Memory requirements may be much higher for this algo

Re: slf4j logger

2010-06-20 Thread Sean Owen
As far as I know, SLF4J is just a means of connecting code to a(nother) logging framework. So yes you would configure the JDK, not SLF4J. If it's not working I expect there's some mismatch in how you are specifying the logger names? or maybe some issue with the logging.properties file. On Sat, Jun

Re: GenericDataModel Serializable

2010-06-20 Thread Sean Owen
Sure, committed. I can imagine wanting to serialize such a model for testing purposes. As a "data model" it's conceptually sound to serialize it. You may find it is not a terribly efficient way to serialize since it uses the default serialization mechanism. You could easily customize the read/writ

Re: Content-based Recommender Implementation

2010-06-22 Thread Sean Owen
This is the part that is more up to you, and outside the framework. Let's say you have movies as items. Let's say you want to use their genre and director (content, attributes) to define some idea of similarity. Maybe you make up the following rule: if genres are the same, add 0.1 to similarity i

Re: Content-based Recommender Implementation

2010-06-22 Thread Sean Owen
ms of content features using a > Recommender > > On Tue, Jun 22, 2010 at 3:11 PM, Sean Owen wrote: > >> This is the part that is more up to you, and outside the framework. >> >> Let's say you have movies as items. Let's say you want to use their >> g

Re: store time in DataModel

2010-06-24 Thread Sean Owen
No, this is not really part of collaborative filtering per se. But nothing about the model precludes you from storing and using time -- it's just not relevant to the core CF algorithm. For example you're welcome to have a timestamp column in your database table alongside other columns, doesn't matt

Re: store time in DataModel

2010-06-25 Thread Sean Owen
OK well if there are several people interested in using this info -- and there are viable algorithms making good use of it, sounds worth integrating. It's not some arbitrary datum -- timestamp is pretty fundamental. I think this change is basically a change to DataModel. Something like getPreferen

Re: Recommendations on binary ratings

2010-06-26 Thread Sean Owen
There's no good guideline here -- it's not actually a great test. It's measuring how many of the recommendations overlap with what the user already knew. But by definition users don't necessarily know about all or even most of the "good" recommendations. So a low score doesn't mean a bad recommende

Re: Version compatibility of Mahout 0.4-SNAPSHOT with Hadoop release?

2010-07-01 Thread Sean Owen
I am not sure where it's written down but it's fairly safe to assume we're tracking the latest release. In this case yes the recommended version is 0.20.2 On Thu, Jul 1, 2010 at 10:46 PM, Gokul Pillai wrote: > What version of Hadoop release is Mahout 0.4-SNAPSHOT compatible with. Where > can I fi

Re: Precomputation

2010-07-03 Thread Sean Owen
The simplest thing to do is compute on-line, in real time, unless you have a reason to do something harder and more complex. And that reason would probably be scale. How big are we talking? There are some links here: https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation tho

Re: Re: Precomputation

2010-07-03 Thread Sean Owen
That's reasonably large. How many rows are in the ratings table? If it's more than about 100M I think you will need distributed approaches, which are necessarily offline. So that would answer that question. 2010/7/3 Young : > Thanks for your reply. There are about 1 million items in these website

Re: Mahout running on Hadoop

2010-07-05 Thread Sean Owen
In general, the Hadoop-based implementations are completely different creatures. Code from the regular online versions doesn't port and the computation needs to be structured quite differently. They're almost different libraries. There's one hybrid, and that is the pseudo-distributed recommender b

Re: Recommend for anonymous users

2010-07-05 Thread Sean Owen
1) Good question. One answer is to make these "anonymous" users real users in your data model, at least temporarily. That is, they need not be anonymous to the recommender, even if they're not yet a registered user as far as your site is concerned. There's a class called PlusAnonymousUserDataModel

Re: "Taste" GroupLens Example on Binary Release?

2010-07-05 Thread Sean Owen
You don't need the latest source from Subversion, though at this stage, that is always highly recommended. Things change fast. 0.3 is now a bit out of date. For this particular task yes I believe you need the source and build scripts since it is actually constructing a deployable component and tha

Re: "Taste" GroupLens Example on Binary Release?

2010-07-05 Thread Sean Owen
10M should be fine, but yeah you probably need a bit more heap than the default of, what, 64MB? 100M probably works too if you can give it a couple gigabytes. On Mon, Jul 5, 2010 at 4:46 PM, Chantal Ackermann wrote: > Hi Sean, > > thank you for clarifying this. > > Just for the records: > I downl

Re: Recommend for anonymous users

2010-07-05 Thread Sean Owen
f the given implementations. That's all. On Mon, Jul 5, 2010 at 4:32 PM, samsam wrote: > About the second question,I have not the similarity,I want to know is how to > pre-compute the item similarity. > > On Mon, Jul 5, 2010 at 11:20 PM, Sean Owen wrote: > >> 1) Good

Re: Recommend for anonymous users

2010-07-07 Thread Sean Owen
2:07 AM, samsam wrote: > >> I become more clear about that,thanks for your help very much. >> >> >> On Mon, Jul 5, 2010 at 11:52 PM, Sean Owen wrote: >> >>> Pre-compute the similarity based on what information? You mention that >>> you don't

Re: Recommend for anonymous users

2010-07-08 Thread Sean Owen
does not exist > > Who knnow how to import the AnonymousRecommender class? > > Best Regards. > > On Thu, Jul 8, 2010 at 12:48 AM, samsam wrote: >> >> thanks very much! >> >> On Thu, Jul 8, 2010 at 12:46 AM, Sean Owen wrote: >>> >>> That

Re: Recommend for anonymous users

2010-07-09 Thread Sean Owen
Oh I see. It appears you've added your own code like AnonymousRecommenderServlet into the framework code in taste-web/src/. This isn't the intent. You want all your code to be built together into one .jar file, separately. The module of course doesn't depend on your code. It would also work if you

Re: build failure when try to install Mahout 0.2

2010-07-09 Thread Sean Owen
I'm not sure about that particular failure, but, version 0.2 is very old. Version 0.3 is also released, but, it is best to use the latest version of the code, which will become 0.4, from Subversion. I know this does not fail in the latest code. On Fri, Jul 9, 2010 at 9:43 AM, Sonia Ben Ticha wrot

Re: Data Types for Item/Element Ids

2010-07-09 Thread Sean Owen
I believe you are referring to recommenders specifically. It used to allow string IDs but it's a lot of overhead. Most applications already use integer identifiers for these entities -- and those that don't can use the support in the library for building a mapping from strings to ints. Vectors nat

Re: Data Types for Item/Element Ids

2010-07-09 Thread Sean Owen
Look for IDMigrator and subclasses. It's a band-aid solution; for a large-scale solution you want to use integer IDs. 2010/7/9 Matthias Böhmer : > Very good question! Could you please point me to the support of the > library where I can find a mapper form strings to longs for item and > unser IDs?

Re: Data Types for Item/Element Ids

2010-07-09 Thread Sean Owen
gt; Thanks, > Kris > > > > 2010/7/9 Sean Owen > >> Look for IDMigrator and subclasses. It's a band-aid solution; for a >> large-scale solution you want to use integer IDs. >> >> 2010/7/9 Matthias Böhmer : >> > Very good question! Could you ple

Re: build failure when try to install Mahout 0.3

2010-07-12 Thread Sean Owen
bin version and use the Fuzzy c-mean in my application? > >  I work under Windows 7 32 bits > > Thanks > > -Message d'origine- > De : Sean Owen [mailto:sro...@gmail.com] > Envoyé : vendredi 9 juillet 2010 11:16 > À : user@mahout.apache.org > Objet : Re: b

Re: Recommend for anonymous users

2010-07-12 Thread Sean Owen
Yes, that's the problem. You should not be modifying taste-web at all. That's not a place for your code. It can't "see" the rest of your code. Build and package all of your code together, not only part of it. On Mon, Jul 12, 2010 at 12:03 PM, samsam wrote: > I think build the recommender and rela

Re: Installing Mahout

2010-07-12 Thread Sean Owen
Yep, exactly as I mentioned. This was something we fixed in the code a while ago. You probably want to use the latest code from subversion which would have that change. On Mon, Jul 12, 2010 at 4:17 PM, tammuz wrote: > > Now I know what is the problem but I do not know the solution yet -but I > th

Re: question about the Twenty Newsgroup example

2010-07-12 Thread Sean Owen
OutOfMemoryError from workers, right? You probably need to give more memory to the Hadoop workers. I do this by setting something like this in mapred-site.xml (there are other similar ways of doing this): mapred.child.java.opts -Xmx256m This gives each a 256MB heap. I don't know ho

Re: Recommend for anonymous users

2010-07-13 Thread Sean Owen
mender. > > I tried using the plusmodel on similarity and recommender instead of the > realmodel, but did not work as well. > > > > On Mon, Jul 5, 2010 at 11:52 PM, Sean Owen wrote: > >> Pre-compute the similarity based on what information? You mention that >> yo

Re: Online Recommendation

2010-07-13 Thread Sean Owen
10 million data points is not large, and should not be "slow". Recommendations should take less than 100ms with normal algorithms and data sets. What are you seeing? There are many, many ways to make a recommender run very slowly, and a few ways to do it right. Without any details, it's not possi

Re: Online Recommendation

2010-07-13 Thread Sean Owen
OK, but these are not log messages from producing recommendations. This shows loading the data into memory the first time. It may take a little time, but, producing recommendations after that should be very fast. 2010/7/13 WoodJustin : > > i did not modify anything in the Mahout in Action. First n

Re: Re: Online Recommendation

2010-07-13 Thread Sean Owen
10 requests per second sounds reasonable. You might wish to test but I believe one processor core on one server can handle that. 2010/7/13 Young : > Thank you, Sean. That makes sense. It uses 1 second to generete > recommendations. After intiating the recommender, the recommend() will be > very

Re: Re: Recommend for anonymous users

2010-07-14 Thread Sean Owen
       pref.setUserID(0, PlusAnonymousUserDataModel.TEMP_USER_ID); >>>        for(int i=0;i<10;i++){ >>>            pref.setItemID(i, votes[i][0]); >>>            pref.setValue(i,votes[i][1]); >>>        } >>>        synchronized(pref) { >>>            plusmodel.s

Re: Re: Re: Recommend for anonymous users

2010-07-14 Thread Sean Owen
That looks basically sound. You probably want to wrap the PearsonCorrelationSimilarity in a CachingItemSimilarity. You may also simply wish to try a different algorithm. What's the data like? if it has lots of items, this is not the best choice. Next step here would be to profile to see where the

Re: Re: Re: Re: Recommend for anonymous users

2010-07-14 Thread Sean Owen
How many unique items? That result really doesn't look right to me. I don't have a good guess looking purely at the code. I think you would have to profile it, and figure out where it is spending so much time. With that more information maybe we can figure it out. Sean 2010/7/14 Young : > Hi Sea

Re: Re: Re: Re: Re: Recommend for anonymous users

2010-07-14 Thread Sean Owen
That's strange, since I've run the same data set and never seen behavior like this. Yes I run on my laptop too, which is fairly similar. Yes of course the time is consumed somewhere from recommend(), but where? I think you'd want to get some clue about where within this processing the time is bein

Re: Re: Re: Re: Re: Recommend for anonymous users

2010-07-14 Thread Sean Owen
Are you giving it enough memory? I wonder whether you are nearly running out of heap and this is making it very very slow. Just give it a bunch of heap with "-Xmx2048m" or something like that. (I'd also recommend Java 6 but that's not the issue here.) On Wed, Jul 14, 2010 a

<    6   7   8   9   10   11   12   13   14   >