Re: Recommend for anonymous users

2010-07-15 Thread Sean Owen
I believe I see the issue. The temporary anonymous user has preferences for items that don't otherwise exist in the data model. And the particular data model delegate doesn't like that. It should be fairly simple to deal with; I just need a moment to think through how that should look. Sean On Th

Re: Recommend for anonymous users

2010-07-15 Thread Sean Owen
I committed a possible fix. On Thu, Jul 15, 2010 at 8:57 AM, Sean Owen wrote: > I believe I see the issue. The temporary anonymous user has > preferences for items that don't otherwise exist in the data model. > And the particular data model delegate doesn't like that. It

Re: Recommend for anonymous users

2010-07-15 Thread Sean Owen
You would want to download the latest source code from Subversion and build it locally. https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control On Thu, Jul 15, 2010 at 11:37 AM, samsam wrote: > where can I get the latest jar of mahout? > > On Thu, Jul 15, 2010 at 6:16 PM,

Re: Hadoop with MySQL-based data model and Spring integration

2010-07-15 Thread Sean Owen
No, not really. A relational database is pretty at odds with the nature of Hadoop, which is about distributing and computing only bits of data at once, in ways that are independent of other data. That's not a property of Mahout as much as Hadoop. But that said, if you're doing recommendation, you

Re: Cooccurrence to align different categorization systems (many to many occurrence)

2010-07-16 Thread Sean Owen
Lets clarify your situation. You are making recommendations or what? Shouldn't have anything to do with Lucene per se. You do not need Hadoop for recommendations if you don't want. ItemSimilarity is not related to Hadoop. Yes you can define whatever notion of similarity that you like this way. Its

Re: Cooccurrence to align different categorization systems (many to many occurrence)

2010-07-17 Thread Sean Owen
gt; I've also read through the hadoop word count tutorial and installed > hadoop (which was as easy as it can be). > > I just don't know where to start as I have not enough experience to > judge what is relevant for my use case. > > Thanks! > Chantal > >

Re: How to combine boolean datamodel with datamodel

2010-07-19 Thread Sean Owen
No, you need one table (or view if you like) containing all data. If you can't do this, you could write your own copy of a JDBCDataModel that can query multiple tables, or, that changes its SQL queries to use UNION statements. I imagine it will slow down a lot. If you mean, can you use a table wit

Re: Cooccurrence to align different categorization systems (many to many occurrence)

2010-07-19 Thread Sean Owen
You have it right. The easiest way to deal with community 1 vs community 2 is to pool all of the categories together into one data model, but simply ignore most-similar categories from the wrong category. That is you're computing similarity between a community 1 "user" and al community 2 "users" on

Re: Re: How to combine boolean datamodel with datamodel

2010-07-19 Thread Sean Owen
Yes you probably want a new, separate table. You have an extra step of computing some notion of similarity anyway, and you probably want to separate this table from your main data table anyhow for reasons of performance and business logic separation. 2010/7/19 Young : > So my prpblem is that I wan

Re: Cooccurrence to align different categorization systems (many to many occurrence)

2010-07-19 Thread Sean Owen
Yeah that's fine. You could do this too. You're not actually making recommendations, just computing most similar items instead of most similar users, so lots of stuff works here. On Mon, Jul 19, 2010 at 2:55 PM, Chantal Ackermann wrote: > Hi, > > mainly for the records: > > I've now mapped my ite

Re: Re: Re: How to combine boolean datamodel with datamodel

2010-07-20 Thread Sean Owen
It still seems strange to observe such a bottleneck, I'm not sure what's going on. You are using an in-memory model like GenericDataModel? We could look at ways to optimize that method, though it looks reasonably tight. Where within that method do you see time spent? 2010/7/20 Young : > Hi again,

Re: How to combine boolean datamodel with datamodel

2010-07-21 Thread Sean Owen
Ah so it really is a function of those particular items. Well we can probably modify this function to be smarter and cap, somehow, the number of items considered. I'm just struggling to figure out how to do so without drawing arbitrary boundaries, like taking the top 100, etc. On Wed, Jul 21, 2010

Re: finding new users

2010-07-23 Thread Sean Owen
Try to recommend, and catch NoSuchUserException. If you have no data at all for a user, there's no way to make recommendations since you know nothing about the user at all. However in this case you typically "recommend" a selection of most popular items to start. 2010/7/23 Matthias Böhmer : > Hell

Re: finding new users

2010-07-27 Thread Sean Owen
There's no direct way to do this, but it's pretty straightforward to loop through a DataModel and pick out the "most popular" items according to whatever definition you like (most ratings, highest average rating, recency, etc.) Of course you can cache that for a long time. 2010/7/26 Matthias Böhme

Re: Search with recommendations

2010-07-27 Thread Sean Owen
I think you could construe this as a search or CF problem and end up with something that works fine. I read into this that you want other users' preferences involved, which suggests it is a bit more of a CF problem. So I can sketch how that would work. It is also the only approach I'd be qualified

Re: Best way to do a recommendation engine based on CLR (Click Through Rate)

2010-07-28 Thread Sean Owen
Yes, preferences are merely an indicator of the strength of an association. They aren't necessarily from explicit ratings; you could base this figure on click through counts. You do not need to scale the values; the particular scale does not matter to any algorithm. However one important lesson i

Re: Could you improve the AbstractJDBCDataModel?

2010-07-29 Thread Sean Owen
That's rather the point of the JDBC-backed models -- they're for the case where you can't load data in memory. If you can, then load them in memory. Yes it's not fast to use JDBC. It only really makes sense with algorithms that, through their nature or caching, don't read a lot of data. You want t

Re: Could you improve the AbstractJDBCDataModel?

2010-07-30 Thread Sean Owen
Also make sure you are using a connection pool! I'd rather not complicate with ehcache or anything like that. The caching can be done in memory with a wrapper. But even I'm not so keen on adding this complication. I don't think JDBC is a great way to access data when it gets big. Put it in memory,

Re: Re: Could you improve the AbstractJDBCDataModel?

2010-08-07 Thread Sean Owen
No, you just update the database tables. That's the point of using the JDBC model. 2010/8/7 Young : > If I want to update the datamodel frequently or in real time, what should I > do? Or I have to use two instance and fit the database data into memory > alternatively? >

Re: A question regarding GenericUserBasedRecommender

2010-08-09 Thread Sean Owen
This is expected behavior as far as I understand the algorithm. I don't see how a user-based recommender can estimated a preference by X for Y if nobody who rated Y is connected to X at all. You can use a PreferenceInferrer to fill in a lot of missing data, but I don't really recommend this for mo

Re: Mahout in Action/ Distributed recommandation

2010-08-09 Thread Sean Owen
The input file format looks wrong. It should be of the form "userID,itemID[,preference]". I think that's your problem here? On Mon, Aug 9, 2010 at 9:18 AM, Florent Empis wrote: > Hi, > > I just tried to follow Mahout In Action, 6.4.2 Running recommendations with > Hadoop > > When I launch >  bin/

Re: Trouble running RecommenderJob with Mahout 0.3 - class not found issues

2010-08-09 Thread Sean Owen
I don't really get it either -- the .job file ought to have everything. That class isn't even in a separate module. But I can tell you the next good step is to use the latest code from Subversion instead of 0.3, since I 99% know that works. On Mon, Aug 9, 2010 at 10:16 AM, Simon Reavely wrote: >

Re: Mahout in Action/ Distributed recommandation

2010-08-09 Thread Sean Owen
Sebastian should we make that default to something? Like a simple co-occurrence count? That would be more consistent with the past behavior. On Mon, Aug 9, 2010 at 10:32 AM, Sebastian Schelter wrote: > It's also necessary to supply the name of a class implementing > org.apache.mahout.math.hadoop.

Re: Trouble running RecommenderJob with Mahout 0.3 - class not found issues

2010-08-09 Thread Sean Owen
I think your input is malformed, what does it look like? (But the error could be better.) On Mon, Aug 9, 2010 at 3:14 PM, Simon Reavely wrote: > I built and hacked together 0.4-snapshot from src > > It now finds the class files - hurrah! > However, I now get an ArrayIndexOutOfBoundsException > >

Re: Mahout in Action/ Distributed recommandation

2010-08-09 Thread Sean Owen
the > splitting step has been omitted in the example, or the author assumed the > job did it but doesn't anymore. > It should be quite straightforward to investigate this tomorrow. > > I turned to the ML in the hope someone already had the issue and found the > problem. I

Re: A question regarding GenericUserBasedRecommender

2010-08-09 Thread Sean Owen
On Mon, Aug 9, 2010 at 6:46 PM, Yanir Seroussi wrote: > As I see it, there are two separate tasks here. The first one is the > recommendation task, where it makes sense to take the N most similar users > and generate recommendations based on their preferences. The second one is > the rating predic

Re: How to create the binary package from source/trunk

2010-08-10 Thread Sean Owen
That's built by 'mvn deploy' if I recall correctly? On Tue, Aug 10, 2010 at 9:37 AM, Simon Reavely wrote: > Hi, > > What I really want to know how to do is build the binary package after I've > done the mvn package. > i.e.something that looks like this: > http://apache.opensourceresources.org/luc

Re: Trouble running RecommenderJob with Mahout 0.3 - class not found issues

2010-08-10 Thread Sean Owen
You sure you don't have a blank line or something in there somewhere? On Tue, Aug 10, 2010 at 8:45 AM, Simon Reavely wrote: > My input is csv, of the form > userid, itemid >

Re: A question regarding GenericUserBasedRecommender

2010-08-12 Thread Sean Owen
I agree with your reading of what the Herlocker paper is saying. The paper is focused on producing one estimated rating, not recommendations. While those tasks are related -- recommendations are those with the highest estimated ratings -- translating what's in Herlocker directly to a recommendation

Re: getAllOtherItems

2010-08-15 Thread Sean Owen
(True, the SVD's real benefit is that it can build more user-user and/or item-item connections by squeezing the data down into many fewer dimensions. It's making more items be co-rated in a sense.) On Sun, Aug 15, 2010 at 1:18 PM, Tamas Jambor wrote: > On 15/08/2010 18:29, Sebastian Schelter wrot

Re: FileDataModel

2010-08-15 Thread Sean Owen
What do you mean by this? I'm not clear yet. On Sun, Aug 15, 2010 at 1:09 PM, Tamas Jambor wrote: > Hi, > > One more possible bug, in FileDataModel, there is nothing to make sure that > the superclass - AbstractDataModel gets the value for maxPreference and > minPreference. > > Tamas >

Re: FileDataModel

2010-08-15 Thread Sean Owen
Ah yeah I see the problem now. I'll fix that. On Sun, Aug 15, 2010 at 5:00 PM, Tamas Jambor wrote: > DataModel model = new FileDataModel(new File("./data/test.txt")); > //just to make sure it loads the model > model.getNumItems(); > System.out.println(model.getMaxPreference()); > > this prints ou

Re: Boolean Recommender evaluator returning score greater than 1

2010-08-16 Thread Sean Owen
RMS is root-mean-square error, which can be arbitrarily large. So, no it's not wrong for it to be above 1. But for boolean data, the evaluation doesn't make sense. You can only use simple IR stats eval -- precision and recall. Those should not be more than 1 as they are percentages. On Mon, Aug 1

Re: combine recommender systems

2010-08-16 Thread Sean Owen
No but you could probably piece that together fairly easily. It would use some weighted average of preference estimates, yes. 2010/8/16 Matthias Böhmer : > Hello, > > is there a way to combine two or more implementations of class > Recommender to a new recommender, e.g. to build a weighted composi

Re: Boolean Recommender evaluator returning score greater than 1

2010-08-16 Thread Sean Owen
Yes, that's what this IR test does. It leaves out several items and sees how many are recommended back. On Mon, Aug 16, 2010 at 2:46 PM, Steven Bourke wrote: > Ah thanks - > > Is there any form of 'leave one out' in the mahout implementation? Precision > and recall are obviously quite useful, bu

Re: Clustering Questions

2010-08-16 Thread Sean Owen
Hmm, these are all passing for me. Sounds like some quirk in your local setup. Under target/surefire-reports you will find complete logs from tests, which would probably reveal the nature of the problem. On Mon, Aug 16, 2010 at 8:07 PM, Severance, Steve wrote: > I updated to the current revision

Re: How to use RecommenderJob with string userID/itemID

2010-08-16 Thread Sean Owen
For this purpose, yes just do your own translation. Just hash down into a 32-bit int. On Mon, Aug 16, 2010 at 5:51 PM, Jeff Heuer wrote: > Hello, > > I have a dataset of users and items where those objects are identified by > text strings (e.g. "x97wfm"), rather than a numeric ID. What would be t

Re: How to evaluate recommendations from hadoop?

2010-08-18 Thread Sean Owen
There's no direct support, no -- a worth TODO for the future. But the principle is the same and you could implement something similar yourself. On Wed, Aug 18, 2010 at 4:18 PM, Ning wrote: > > Is there an easy way to evaluate recommendations from Hadoop? > -- > View this message in context: > ht

Re: In-memory implementation of co-occurrence algorithm

2010-08-18 Thread Sean Owen
I don't think so, because it is so simplistic. You mean a similarity metric based just on co-occurrence count? if that's what you want, please use LogLikelihoodSimilarity. It's for the in-memory code, and is much better. On Wed, Aug 18, 2010 at 4:22 PM, Ning wrote: > > Since our data set is relat

Re: regarding taste application

2010-08-19 Thread Sean Owen
Let me point you at the new version of that page: https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation There is nothing to know about direct integration. Just write such code, put it in your project, and include Mahout as a dependency. How to run it would be up to you, and

Re: adding feature:skip user's non-interested items when generate recommendation for user.

2010-08-23 Thread Sean Owen
Sebastian is right that in this case, you might well model these as preferences with low value. It's reasonable, but, I also agree that somehow an 'ignored' recommendation does not necessarily mean the same as a low preference. There are some situations where you might want to exclude items from re

Re: adding feature:skip user's non-interested items when generate recommendation for user.

2010-08-23 Thread Sean Owen
(Uncanny, I was just minutes before researching Grooveshark for unrelated reasons... Good to hear from any company doing recommendations and is willing to talk about it. I know of a number that can't or won't unfortunately.) Yeah, sounds like we're all on the same page. One key point in what I thi

Re: 1st MapReduce job in RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item)

2010-08-25 Thread Sean Owen
The issue is that user and item IDs may be longs, but they are used as indexes into a vector, which are ints. This does a hashing and stores that mapping, so it can be reversed at the end. For the reverse mapping, in the case of collision, the lowest long key wins. And, the hash also has the nice p

Re: Initializing log4j for running RecommenderJob on AWS EMR

2010-08-26 Thread Sean Owen
I think it's a question for Hadoop. It's the bit using log4j. Mahout uses SLF4J. I'm not sure how to configure Hadoop's logging. On Thu, Aug 26, 2010 at 7:45 PM, Stanley Ipkiss wrote: > > When I run the RecommenderJob on AWS EMR(Elastic MapReduce), I get the > following message in the stderr file

Re: recommendations

2010-08-29 Thread Sean Owen
These are slightly different from conventional collaborative filtering, but I think solutions are available. "Customers with Similar Searches Purchased" To apply user-based CF you need a notion of user-user similarity. You could think of this as a sub-problem, where users are users and searches a

Re: recommendations

2010-08-29 Thread Sean Owen
Yes, this is a simpler problem. You just want to find which items are most similar to a given item, for some definition of 'similar'. GenericItemBasedRecommender has a mostSimilarItems() method that just saves you the trouble of computing this by hand, and any ItemSimiliarity function you like can

Re: CPU Time

2010-08-29 Thread Sean Owen
It could vary a lot. How many users? items? which similarity metric? But, that's fairly small. I'd be surprised if you couldn't do recommendations in under 100ms per request per core. 2010/8/29 Young : > Hi all, > > Based on 1M dataset, about how many requests could be expected to be handled > a

Re: recommendations

2010-08-30 Thread Sean Owen
association mining is > needed? > > Pramit > > On Mon, Aug 30, 2010 at 12:07 AM, Sean Owen wrote: > >> Yes, this is a simpler problem. You just want to find which items are >> most similar to a given item, for some definition of 'similar'. >> GenericItem

Re: user-based and item-baesd results

2010-08-30 Thread Sean Owen
That result is quite possible. For example, with a user-based recommender, the only items that can possibly be recommended are those in the user's neighborhood. If the neighborhood is small, it's possible that only 23 unique items exist among users in that neighborhood. You can never get more recom

Re: Re: user-based and item-baesd results

2010-08-30 Thread Sean Owen
able and accepted. > > -- Young > > > > > At 2010-08-30 23:55:15,"Sean Owen" wrote: > >>That result is quite possible. For example, with a user-based >>recommender, the only items that can possibly be recommended are those >>in the user's neigh

Re: User/Item symmetry?

2010-08-31 Thread Sean Owen
I don't think they're identical, no. The concepts must both exist in the API or it would be unintelligible. Or, UserSimilarity and ItemSimilarity might be unifiable into ThingSimilarity but I wonder whether that would start to get hard to understand. If that's what you mean, yes you can argue that

Re: SVDRecommender convergence?

2010-08-31 Thread Sean Owen
I don't have any good rules of thumb for you -- maybe the author can chime in. It should be a fairly standard implementation and I would not expect unusual behavior in this regard, but can't say I know either way. But are you asking a question about the recommender or the evaluator? On Tue, Aug 3

Re: About the SVDRecommender

2010-08-31 Thread Sean Owen
Presumably in the result of the evaluation -- average absolute difference in actual/estimated preference. The eval trains with a random subset of the data and tests with the rest. I just realized from your other mail that you are using a data set with 10,000 ratings only. That's fairly small and

Re: Question about data warehousing and mining through Mahout

2010-08-31 Thread Sean Owen
I think you'd have to begin to define what you want to do with the logs? What do you mean when you say "data mining"? On Tue, Aug 31, 2010 at 10:21 PM, hdev ml wrote: > Hi all, > > I am currently trying to find out what frameworks/software/product will > support data warehousing/data mining the b

Re: Question about data warehousing and mining through Mahout

2010-08-31 Thread Sean Owen
On Tue, Aug 31, 2010 at 10:55 PM, hdev ml wrote: > Per my understanding of hive, we can do some statistical reporting, like > frequency of user sessions, which geographical region, which device he is > using the most etc. Yes that's about what Hive is good for, if you're looking for some open-sou

Re: Question about data warehousing and mining through Mahout

2010-09-01 Thread Sean Owen
part be done by Mahout on this hive data? > > -H > > On Tue, Aug 31, 2010 at 3:03 PM, Sean Owen wrote: > >> On Tue, Aug 31, 2010 at 10:55 PM, hdev ml wrote: >> > Per my understanding of hive, we can do some statistical reporting, like >> > frequency of user s

Re: SVDRecommender convergence?

2010-09-01 Thread Sean Owen
All in all that sounds roughly reasonable to me -- that's a reasonable number of features and iterations, and the eval result is varying by just about 3% (scale of 1-5). On Tue, Aug 31, 2010 at 11:55 PM, Lance Norskog wrote: > About the SVDRecommender- 10 features and 50 iterations gave > evaluat

Re: SVDRecommender convergence?

2010-09-02 Thread Sean Owen
Mahout in Action? Yes it's in there. The output is the average absolute difference between the actual and estimated rating. GroupLens ratings are from 1 to 5. 0.7 average error isn't bad on that scale. On Thu, Sep 2, 2010 at 3:26 AM, Lance Norskog wrote: > Ah! What exactly does 0.70 mean as an ev

Re: Install and use just the recommendations module of Mahout

2010-09-02 Thread Sean Owen
That is all in mahout-core. If you like, you can then just use that .jar file / Maven dependency. On Thu, Sep 2, 2010 at 8:34 AM, cristi prodan wrote: > Hello, > > Is it possible to install and use just the recommendations functionality > (taste) from Mahout ? If yes, how is that done ? > > Than

Re: from Arff to Vector

2010-09-02 Thread Sean Owen
When you run, you need all dependent code, not just Mahout. Mahout builds ".job" files under target/ which contain all dependencies. Use this as your JAR file when you run on the command line. On Thu, Sep 2, 2010 at 2:05 AM, Valerio Ceraudo wrote: > hi all, > i'm at my last step for thesi,covert

Re: Mahout svn is empty ?

2010-09-02 Thread Sean Owen
Are you looking at the new, non-Lucene location? https://cwiki.apache.org/confluence/display/MAHOUT/Version+Control On Thu, Sep 2, 2010 at 8:57 AM, Jeff Zhang wrote: > Hi all, > > where's the source code of mahout ? The trunk is empty. > > > -- > Best Regards > > Jeff Zhang >

Re: Build Error - NegativeBinomialTest Fails

2010-09-02 Thread Sean Owen
Yeah I saw this too. I think it's also due to variation in random number generation. I am about to commit many changes including a fix for this. On Thu, Sep 2, 2010 at 12:58 PM, Lahiru Samarakoon wrote: > Dear All, > > While I was building the Mahout, I was stumped by a Build Failure which was >

Re: Mahout svn is empty ?

2010-09-02 Thread Sean Owen
It should have one file which tells you the location of the new repo. No, that is no longer the main repository. On Thu, Sep 2, 2010 at 2:45 PM, Jeff Zhang wrote: > Thanks Sean, but why this link > http://svn.apache.org/repos/asf/lucene/mahout/trunk is empty ? > Isn't it mahout's office site ? >

Re: question about SVD based recommender

2010-09-02 Thread Sean Owen
I think it's stable enough, though I'd surely encourage you to use the latest code in SVN since things change so much. This class hasn't changed much at all though. It's an in-memory implementation, not distributed, note. I'm not the author, but understand that the implementation is based on an e

Re: Failed Tests

2010-09-04 Thread Sean Owen
Yeah, a load changed with the tests, such that I'm sure it has to be rebuilt from scratch to work. On Sat, Sep 4, 2010 at 1:22 AM, Severance, Steve wrote: > I got this fixed. I resynced and cleaned the solution and everything built > fine. >

Re: Map/Reduce algorithm discussion goups?

2010-09-06 Thread Sean Owen
(This is the topic of about half of Mahout in Action -- at least, MapReduce algorithms in the context of machine learning.) On Mon, Sep 6, 2010 at 9:44 AM, Lance Norskog wrote: > Is anyone writing "Algorithms in Map/Reduce"?

Re: Classpath question

2010-09-11 Thread Sean Owen
IIRC this packaging mechanism only works if you add "Class-Path" entries to META-INF/MANIFEST.MF that specify the location of the .jar within the archive. I prefer to simply re-package all dependencies together, like Mahout does. On Sat, Sep 11, 2010 at 2:44 AM, Mark wrote: > Perhaps this a be

Re: Running example

2010-09-11 Thread Sean Owen
It would be great if anyone who finds corrections to that page could post a definitive change that's needed. But yes like Sebastian I don't have the issue -- is it a matter of working directory indeed that fixes it? On Fri, Sep 10, 2010 at 11:05 PM, Sebastian Schelter wrote: > Hi, > > You have t

Re: Adding UID's to nearest neighbor

2010-09-12 Thread Sean Owen
Sure, just implement UserNeighborhood. Inside, delegate all the logic to another implementation like NearestNUserNeighborhood. But tack on your chosen user ID before you return the neighborhood to the caller. On Sun, Sep 12, 2010 at 11:57 PM, Steven Bourke wrote: > Hi - > > Is there a straight fo

Re: (Not) ignoring NaN predicted preference values

2010-09-14 Thread Sean Owen
It's a fair point. I don't see a clear way to include this information in the average absolute difference figure. You could add in a 'penalty' datum of (max rating - min rating) or something to the average, but that's a little artificial. 1 But it could be separately reported, as at least a log me

Re: java.lang.NoSuchMethodError while running the K-means example

2010-09-16 Thread Sean Owen
Use Hadoop 0.20.2, which is the current stable release and the one Mahout needs. It looks like you are using 0.21.0. It sounds like they changed some method signatures in 0.21 that the current code uses. On Thu, Sep 16, 2010 at 12:26 PM, Lahiru Samarakoon wrote: > Hi all, > > I tried to run the "

Re: Evaluator for RecommenderJob (hadoop implementation)?

2010-09-17 Thread Sean Owen
No I don't know of such a thing. It'd be great if you implement and are in a position to contribute it. On Fri, Sep 17, 2010 at 1:59 AM, Stanley Ipkiss wrote: > > Has someone already written an evaluator for the hadoop implementation of cf? > I was looking for something like the RecommenderEvalua

Re: TreeClusteringRecommender, clustering, and multiple processors

2010-09-17 Thread Sean Owen
I checked in what I think is a slightly better solution. The threads will block until construction but won't cause it to re-build each time. I just added a double-checked-locking pattern here, which is 99.% bulletproof in Java, and that's sufficient for this context. On Fri, Sep 17, 2010 at 6:

Re: PlusAnonymousUserDataModel usage?

2010-09-17 Thread Sean Owen
I have used it in a production system. What issues do you see? On Fri, Sep 17, 2010 at 10:33 AM, Lance Norskog wrote: > Does anyone have an example of successfully using > PlusAnonymousUserDataModel? There are no unit tests for it. I'm playing > around and finding one problem after another in var

Re: Evaluator for RecommenderJob (hadoop implementation)?

2010-09-17 Thread Sean Owen
It would be neither, I'd imagine. You need a stand-alone class that can read input from HDFS and the output from HDFS and do the eval math. There also needs to be some pre-processing stage to segregate test prefs from training prefs, and run recs appropriate. It's all straightforward but will take

Re: PlusAnonymousUserDataModel usage?

2010-09-18 Thread Sean Owen
Sounds like a clean fix. Really, the semantics should be tightened up. Until temp prefs are set, the "temp user" should not exist and the facade should behave accordingly. The temp prefs can't be null or empty, and there should be a way to un-set them too. I can add that for your consideration. Wh

Re: PlusAnonymousUserDataModel usage?

2010-09-19 Thread Sean Owen
Well, you could broadly call all machine learning "analysis and optimization" of a sort! What do you mean, specifically? If you mean you expect this to compute online in real-time rather than off-line, in batch, as the output of some standalone tool -- it is online. You're suppose to query these in

Re: item-based recommendation for different item

2010-09-20 Thread Sean Owen
Yes, you would need to explain more about what you are doing, but I can guess: When viewing an item, you are computing most-similar items and showing those items. That process does not have anything to do with the user, so, yes you will always get the same result. How do you want the user to be i

Re: PlusAnonymousUserDataModel usage?

2010-09-21 Thread Sean Owen
contract of continuous > mutability? Are they allowed to require a batch data rebuild for any change > in a user's prefs? > > > Sean Owen wrote: > >> Well, you could broadly call all machine learning "analysis and >> optimization" of a sort! What do you mean

Re: item-based recommendation for different item

2010-09-21 Thread Sean Owen
What you need is a IDRescorer. Use any Recommender you like, but, pass in a IDRescorer object too which will boost a recommended item's score if it is more similar to some target item. For example I might simply multiply by that similarity score. This should achieve your desired effect. On Tue, Se

Re: SVDRecommender

2010-09-21 Thread Sean Owen
On Tue, Sep 21, 2010 at 3:53 PM, James James wrote: > > Hi, > > I was looking at the implementation of the SVDRecommender, and was wondering > if > anyone could point me to a paper or an algorithm on which the implementation > is > based. This was asked recently on the mailing list, and all I k

Re: PlusAnonymousUserDataModel usage?

2010-09-21 Thread Sean Owen
Yes, though I think there aren't cases where you'd only find that out at runtime. The caller will have planned this already. There could be an isMutable() method, though I am wondering whether there's a case where the caller can then meaningfully do something else. On Wed, Sep 22, 2010 at 2:22 AM,

Re: [jira] Created: (MAHOUT-510) Standardize serialization mechanisms

2010-09-22 Thread Sean Owen
Done (moved to user@) Is anyone using any JSON related serialization in the code? in a way that's painful to remove? Just gauging whether it is used now. On Wed, Sep 22, 2010 at 4:13 PM, Jeff Eastman wrote: >  Perhaps we should solicit input on removing Json from our user community > too? I'd ha

Re: GenericUserBasedRecommender vs GenericItemBasedRecommender

2010-09-22 Thread Sean Owen
Quite an interesting question indeed. On Thu, Sep 23, 2010 at 2:05 AM, Stanley Ipkiss wrote: > On the other hand when using user based collaborative filtering, you always > use the same neighborhood set, irrespective of whether that user has rated > the particular item or not. You check for this

Re: GenericUserBasedRecommender vs GenericItemBasedRecommender

2010-09-22 Thread Sean Owen
On Thu, Sep 23, 2010 at 2:35 AM, gabeweb wrote: > I think the simple point is that the primary use case of a recommender is to > return the n-best recommended items, rather than return the predicted rating > for a single item.  In that case, if an item can't get a predicted rating > because no use

Re: cannot build the project

2010-09-23 Thread Sean Owen
I don't see test failures locally, or on Hudson. Can you dig out the logs from these failures? They're in files in target/surefire-reports. That would be much more helpful and specific as you'll see a stack trace probably. On Thu, Sep 23, 2010 at 3:56 PM, Tamas Jambor wrote: > Hi, > > I was tryi

Re: get similar items

2010-09-23 Thread Sean Owen
Yes, try GenericItemBasedRecommender.mostSimilarItems() On Thu, Sep 23, 2010 at 4:43 PM, Sam Yang wrote: > All: >  Can I get similar items of one item? I know ItemSimilarity is used to > compute the similarity between items,but seems can't get similar items of > one item. > > Best Regards. > > --

Re: Evaluation approach in AbstractDifferenceRecommenderEvaluator

2010-09-23 Thread Sean Owen
I agree with that. But that is not what these figures are. Evaluation percentage is purely a lever to reduce the size of the input for speed. If evaluation percentage is 0.15 (15%), then 85% of all data is thrown out upfront. Training percentage is what you're talking about. If it is 90%, then 90

Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator

2010-09-23 Thread Sean Owen
The logic seems OK to me. invokeAll() kicks off all the Callables. It waits for all to finish on account of the calls to get(), which block until a result is ready. I think the isDone() call is redundant indeed but shouldn't hurt -- get() isn't called in the case that it's already done. Neverthele

Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator

2010-09-23 Thread Sean Owen
exception occurs. I used counters to double-check the number of results was as expected. On Fri, Sep 24, 2010 at 7:15 AM, Sean Owen wrote: > The logic seems OK to me. invokeAll() kicks off all the Callables. It > waits for all to finish on account of the calls to get(), which block > un

Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator

2010-09-23 Thread Sean Owen
Ah you're right. It says: "Executes the given tasks, returning a list of Futures holding their status and results when all complete. Future.isDone() is true for each element of the returned list." ... which reads to me like "when all complete" modifies "holding" rather than "returning". Quite cha

Re: PlusAnonymousUserDataModel java.sql.SQLException: No database selected

2010-09-24 Thread Sean Owen
The issue isn't with any of this code. It's in how you are configuring the DataSource in your container. Sounds like you are specifying host and port, but, you typically must also specify a database or schema within that server. That should be the one containing your data table. On Fri, Sep 24, 2

Re: running examples for mahout.

2010-09-24 Thread Sean Owen
Looks like the input file's format isn't right. It's expecting that funky "::"-delimited file. What is the input like? The error message sure could be better, I can fix that. On Fri, Sep 24, 2010 at 4:54 PM, web service wrote: > How do I run examples listed at > https://cwiki.apache.org/MAHOUT/r

Re: Getting error in Training the classifier as in TwentyNewsgroup

2010-09-24 Thread Sean Owen
"export MAVEN_OPTS=-Xmx1g" or such will give the JVM more memory. Should be on the wiki if it's not. The other is some weird error in Maven (not the project?). Guys did we ever figure that out? I remember conversations about this but no definite resolution. On Fri, Sep 24, 2010 at 6:35 PM, Bhaska

Re: running examples for mahout.

2010-09-24 Thread Sean Owen
> 196 242 3 881250949 > [/code] > > I just want to run example. That's it. > > -Mac > > On Fri, Sep 24, 2010 at 9:28 AM, Sean Owen wrote: > >> Looks like the input file's format isn't right. It's expecting that >> funky "::"-d

Re: running examples for mahout.

2010-09-25 Thread Sean Owen
orresponding ratings ? A bit confused about it. > > Exactly what does output mean or what does it look like ? > > > -Mac > > On Fri, Sep 24, 2010 at 12:25 PM, Sean Owen wrote: > >> The GroupLensDataModel expects the input from the "1 million" data set >>

Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator

2010-09-25 Thread Sean Owen
This is a different issue. I see it and will put in a fix today. On Fri, Sep 24, 2010 at 9:07 PM, Stanley Ipkiss wrote: > > I did that change yesterday in my code, but forgot to post the update here. > The error that I get sometimes is - > > Caused by: java.lang.NullPointerException >        at >

Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator

2010-09-25 Thread Sean Owen
, Sean Owen wrote: > This is a different issue. I see it and will put in a fix today. > > On Fri, Sep 24, 2010 at 9:07 PM, Stanley Ipkiss > wrote: >> >> I did that change yesterday in my code, but forgot to post the update here. >> The error that I get

Re: get similar items

2010-09-26 Thread Sean Owen
No, it does not need preferences. Use a notion of similarity that does not need preferences, like LogLikelihoodSimilarity. But it sounds like you mean something different. You don't have any data to start, yes, so you can't determine similar items. Well yes, with no data at all, algorithms won't h

Re: get similar items

2010-09-26 Thread Sean Owen
GenericItemBasedRecommender.mostSimilarItems() does not care. It just uses your ItemSimilarity to do its work. You'd have to be more specific to get more feedback. Are you sure your ItemSimilarity is working, not returning NaN? On Sun, Sep 26, 2010 at 2:40 PM, Sam Yang wrote: > I have custom impl

<    7   8   9   10   11   12   13   14   15   >