Re: get similar items

2010-09-26 Thread Sean Owen
OK, so that issue was fixed. What is your current issue then? On Mon, Sep 27, 2010 at 3:37 AM, Sam Yang wrote: > And it's like this issue:   https://issues.apache.org/jira/browse/MAHOUT-367

Re: get similar items

2010-09-27 Thread Sean Owen
orative filtering as your user-item interaction data grows." > Do I get the similar items with other tool.like Lucene,can Mahout do it > without preference data at start? > > On Mon, Sep 27, 2010 at 2:59 PM, Sean Owen wrote: > >> OK, so that issue was fixed. What is y

Re: get similar items

2010-09-27 Thread Sean Owen
Yeah but that doesn't depend on having preference values right? "Boolean" data works fine. On Mon, Sep 27, 2010 at 8:23 AM, Sebastian Schelter wrote: > But the initial set of "candidate" items that will be given to the > estimators in GenericItemBasedRecommender.doMostSimilarItems() is > fetched

Re: get similar items

2010-09-27 Thread Sean Owen
(+user) Yes, I think that's the question to clarify. But if an item is brand new and has no data whatsoever, nothing here helps. You just want to return some generic default list of items. On Mon, Sep 27, 2010 at 8:30 AM, Sebastian Schelter wrote: > Maybe we misunderstand each other here. I unde

Re: Possible multi thread issue in AbstractDifferenceRecommenderEvaluator

2010-09-28 Thread Sean Owen
Yes, try the latest code from HEAD please. On Tue, Sep 28, 2010 at 8:02 PM, Stanley Ipkiss wrote: > > Here is the complete stack -

Re: Text Classification using Mahout

2010-09-30 Thread Sean Owen
Ignore it, it's just Maven doing its thing in the background. It should work fine without internet connectivity. On Thu, Sep 30, 2010 at 1:54 PM, Neil Ghosh wrote: > Hi, > > I am running the twenty-newsgroups example without hadoop , with the > following command > > $ mvn -e exec:java \ > -Dexec.

Re: Computing userSimilarity in Taste AbstractSimilarity

2010-09-30 Thread Sean Owen
I think it does work, but this code is definitely hard to grok. In my defense it is complex for a reason at least -- performance. When the end of one list of prefs is reached (the line "if (++xPrefIndex >= xLength)") it does check for an inferrer in the next line. If there is one, it sets "xIndex

Re: Computing userSimilarity in Taste AbstractSimilarity

2010-09-30 Thread Sean Owen
Yep looks like this was added since 0.3. You should definitely follow SVN HEAD in general as things change fast. On Thu, Sep 30, 2010 at 10:53 PM, Abigail Gertner wrote: > I think I must be looking at an older version of the file. I have > mahout-0.3 (the most recent one) downloaded from source

Re: recommendation mechanism

2010-10-01 Thread Sean Owen
Sure. I would suggest you create an ItemSimilarity implementation which loads this additional information, and constructs some formula for similarity based on whether movies share genre, actors, etc. For example maybe being in the same genre is worth +0.1 similarity. Maybe same actor is worth +0.2.

Re: Mahout usage

2010-10-01 Thread Sean Owen
Yes I know (directly) of 5 companies using Mahout for recommenders, and only 1 allows it to be mentioned -- Mippin. There are of course more that aren't known to me. In several cases, the people who built the system don't work directly for the company. They're fine with mentioning it, but it's not

Re: Mahout usage

2010-10-02 Thread Sean Owen
I'm also aware of a number of papers which at least used the code to crank out some results for other research: http://scholar.google.com/scholar?hl=en&q=mahout+'machine+learning' On Sat, Oct 2, 2010 at 4:12 AM, Lance Norskog wrote: > One of the northern European govt. studios (I think Finland)

Hadoop Taste

2010-10-02 Thread Sean Owen
(Copying and pasting a message that got lost, on behalf of Chris --) It's really a matter of running RecommenderJob locally. It is a Java program that kicks off the Hadoop jobs. You need to have a Hadoop cluster running locally and configured as described at hadoop.apache.org, but beyond that shou

Re: Hadoop Taste

2010-10-02 Thread Sean Owen
st via email from now on. > > Thanks for your help > Chris > > > > On Oct 2, 2010, at 1:00 AM, Sean Owen wrote: > > (Copying and pasting a message that got lost, on behalf of Chris --) > > It's really a matter of running RecommenderJob locally. It is a Java &

Re: Mahout & Hadoop

2010-10-02 Thread Sean Owen
Can you define "data mining"? that would help answer about what options you have much better. On Sat, Oct 2, 2010 at 4:46 PM, Latency Buster wrote: > > What did you want to do with Mahout? How much data do you have? > > > > There are many capabilities that don't use Hadoop, some that require it.

Re: Query

2010-10-03 Thread Sean Owen
You probably want to look at Shannon's spectral clustering code? That's the closest thing I can think of in Mahout. It doesn't have much of anything for image processing. On Sun, Oct 3, 2010 at 5:02 PM, gagan chhabra wrote: > Hello all, > > I am a Engineering candidate and took a project which

Re: Loggin question

2010-10-04 Thread Sean Owen
Mahout logs via SLF4J, which is a sort of meta-logger. It logs to whatever log system you have in your classpath (by adding certain SLF4J bindings). So you configure *that* logging system. By default I think it does nothing, so if you see the messages, you've got some logging bindings going, maybe

Re: Recommenders and DataModels

2010-10-06 Thread Sean Owen
Interesting question. So the preferences are synthetic in some cases -- you have a pref for ever user-item combination? (Then what do you recommend? but I can imagine some answers.) By "not work well" do you mean performance or accuracy? For performance, yes, having very dense input will really

Re: DataModel and recommendation

2010-10-06 Thread Sean Owen
For the JDBCDataModel implementations, there is no concept of reloading. The database is always queried for the latest data; it's always up-to-date. The UserSimilarity implementations likewise always compute from current data in the DataModel. You're not using a CachingUserSimilarity wrapper, so i

Re: DataModel

2010-10-06 Thread Sean Owen
In general, if you want real-time recommendations, you want the data in memory. Otherwise it's too slow. The JDBC-backed model works for, roughly, small problems up to a couple million ratings. Beyond that, stick it in memory. (And past about 100M ratings, you need to consider distributing the comp

Re: Oracle DataModel

2010-10-07 Thread Sean Owen
It probably also works with an Oracle database. I tried to use standard SQL. If it does I'd be interested to hear. Or if it needs slight changes you can help create OracleJDBCDataModel. On Thu, Oct 7, 2010 at 3:04 AM, Sam Yang wrote: > When I use Oracle to privide datamodel,which class can I use

Re: Recommenders and DataModels

2010-10-07 Thread Sean Owen
Yeah that's a good question ! Most algorithms answer the question of "what are the top N recommendations" by estimating unknown preferences. If you already estimate all unknown preferences then there would be nothing left to recommend. On Wed, Oct 6, 2010 at 8:44 PM, Lance Norskog wrote: > Since

Re: Libimseti Example in new eclipse project

2010-10-07 Thread Sean Owen
It's in the .war file, yes, but is more importantly in the mahout-taste-web module. Just depend on that. On Wed, Oct 6, 2010 at 7:04 PM, Chris Schilling wrote: > Hello, > > I am working through the Libimseti example in MIA.  I created a new project > in eclipse, ... etc...  Anyway, Everything is

Re: Share your experience using Mahout in the real-world?

2010-10-07 Thread Sean Owen
Joseph did you see some threads on this topic on user@mahout.apache.org recently? I recall people summarized what they know about users in the field. The bad news is most of the usage I (we) seem to know about aren't cleared to be talked about publicly. It seems like 2/3 of the time the work is do

Re: DataModel.getItemIDs

2010-10-07 Thread Sean Owen
Hmm, haven't heard of that. Can you provide more details, about the database and such? What you describe doesn't sound like correct behavior. Yes a patch would be great, in describing what the apparent issue is. On Thu, Oct 7, 2010 at 7:55 PM, Alec Feuerstein wrote: > Hello, > > I don't suppose a

Re: DataModel.getItemIDs

2010-10-07 Thread Sean Owen
t; data... > > Setting the result set as ResultSet.TYPE_SCROLL_INSENSITIVE clears > things up as it allows for scrolling. > > -Alec > > On Thu, Oct 7, 2010 at 2:13 PM, Sean Owen wrote: >> Hmm, haven't heard of that. Can you provide more details, about the >> data

Re: Running example

2010-10-07 Thread Sean Owen
Update to head from SVN and try again. I believe I fixed that. On Thu, Oct 7, 2010 at 10:41 PM, Defenestrator wrote: > I've been busy, so I hadn't had a chance to continue this until today. > > Here is the current error when following the steps for running an example: >

Re: Selection Criteria in FileDataModel

2010-10-08 Thread Sean Owen
There's nothing built-in. Yeah I'd view that as a step outside the core library. On Fri, Oct 8, 2010 at 5:56 PM, Chris Schilling wrote: > Hello, > > I am wondering if it is possible to place selection criteria when reading in > the preference data used by taste to make predictions.  For instance

Re: Selection Criteria in FileDataModel

2010-10-08 Thread Sean Owen
: > It would be a nice feature to have build into the api for sure. You could > use the getPreferencesFromUser to determine which users have the appropriate > level of options. > > On Fri, Oct 8, 2010 at 6:27 PM, Sean Owen wrote: > >> There's nothing built-in. Y

Re: Recommender Evaluation returns NaN

2010-10-09 Thread Sean Owen
That's odd since, yes, the cause is almost always that too many estimates are NaN. Can you point a debugger at it and try to figure out where it goes wrong with a watchpoint? AFAIK the only reason this happens is if none of the test preferences can be estimated. So might make sure you are not some

Re: GenericJDBCDataModel constructor bug

2010-10-10 Thread Sean Owen
Bleh, what a dumb typo. Good one, I've fixed it. On Sun, Oct 10, 2010 at 11:56 PM, Alec Feuerstein wrote: > Okay, > > I think I really got one this time -- again... > > org.apache.mahout.cf.taste.impl.model.jdbc.GenericJDBCDataModel > > The constructor that takes properties calls the parent's con

Re: Modelling typed vectors?

2010-10-11 Thread Sean Owen
It is a handy trick, and there's a lot of custom Writables involved in a good MapReduce pipeline, it seems to me. Yes, inside your Writable, use VectorWritable to manage the Vector part. On Tue, Oct 12, 2010 at 4:23 AM, Lance Norskog wrote: > I have a M/R project where vectors of two different ty

Re: "Context-aware" recommendations

2010-10-12 Thread Sean Owen
Yeah this is embodied in the "IDRescorer" class which lets you influence the final recommendations however you want, for just this sort of reason. On Tue, Oct 12, 2010 at 11:55 AM, Sebastian Schelter wrote: > Hi everyone, > > I have some non-release-related offtopic questions ;) > > I've attended

Re: Modelling typed vectors?

2010-10-12 Thread Sean Owen
ue, Oct 12, 2010 at 10:28 PM, Lance Norskog wrote: > Ok. Now, how would one save payloads with the Vector I/O tools? > > On Mon, Oct 11, 2010 at 11:30 PM, Sean Owen wrote: > > It is a handy trick, and there's a lot of custom Writables involved in > > a good MapReduce p

Re: Modelling typed vectors?

2010-10-12 Thread Sean Owen
If that's all that's meant -- seems like you just want to write VectorAndThingWritable rather than inject an optional Thing into VectorWritable. It'd work either way but seems cleaner to compose it that way. VectorAndThingWritable might belong in core depending on how general "Thing" is. On Tue, O

Re: How to prepare movie lense data for Mahout recommendation job

2010-10-14 Thread Sean Owen
Well, it needs to be in comma-separated or tab-separated format, with fields "user ID" then "item ID" (optionally followed by preference value) on each line. Is there something more? On Thu, Oct 14, 2010 at 1:47 PM, JAGANADH G wrote: > Dear All > Can somebody tell how to prepare the movie lenses

Re: Caching a DataModel

2010-10-15 Thread Sean Owen
There is not. The theory is that the DataModel's job is to always give fresh data, and let it be cached from there. The weak argument is that if it's already in memory, then caching doesn't help, and if it's not, it's probably too big to meaningfully cache. And in fact all the DataModels are in mem

Re: Why does evaluating a recommender take far less time than actually generating results?

2010-10-18 Thread Sean Owen
No the test data can't be included in the training data, or else it would be like giving a student the answers to the exam before-hand. You're doing much less work for other reasons. Recommendation is a bigger problem. It may require computing many estimated preferences to get one set of recommend

Re: 0.0 as null versus number in recommender

2010-10-18 Thread Sean Owen
The simplest and best answer is that 0 is not the same as null. The framework does not treat them as the same, as a rule. A preference of 0 has some effect on computations; a preference that does not exist has none. The twist here is that there is no such thing as "null" for the mathematical entit

Re: Fastest way to compute compute correlation between users and generate new recommendations

2010-10-19 Thread Sean Owen
If you specifically want a correlation, meaning the Pearson correlation, then you want to use PearsonCorrelationSimilarity. If you just mean you want some notion of similarity, then any implementation of UserSimilarity could be used. If speed is your concern, then I would try LogLikelihoodSimilarit

Re: Fastest way to compute compute correlation between users and generate new recommendations

2010-10-19 Thread Sean Owen
how I avoid to recompute user correlation and then recommendation when > I > don't have any modification of ratings? > > Thank you. > > > 2010/10/19 Sean Owen > > > If you specifically want a correlation, meaning the Pearson correlation, > > then you want to us

Re: Recommender system implementations

2010-10-20 Thread Sean Owen
Yes I think it's a good idea for the reason Gabriel gave. It's the best answer to give. I'm reluctant to change this behavior at this point, as this part of the code is more mature-ish and in use than others. In the use case you reference, evaluation, there's already support for doing this kind of

Re: FullRunningAverage possibly inefficient and (very slightly) inaccurate?

2010-10-21 Thread Sean Owen
This class is used mostly in slope-one. The priorities are memory usage, then speed, then accuracy in that context. Because of memory concerns, I don't want to store more than an int and double. So yes you can store a total and count. So we're left trading speed for accuracy. Your change is a bit

Re: Recommender system implementations

2010-10-22 Thread Sean Owen
Yah I still think held-out data is the best thing, if you want to use this built-in evaluation mechanism. Hold out the same data from both models and run the same test. There is another approach which doesn't necessarily require held-out data. On the original, full model, just compute recommendati

Re: save Mahout result in the memcached

2010-10-25 Thread Sean Owen
Perhaps I'm naive but wouldn't virtual memory be a better way to expand memory by using storage? Or, I think you'd have to know there were particular access patterns, and design your use of memcached to exploit those well, to do better than swap. On Mon, Oct 25, 2010 at 10:26 PM, Hank Li wrote:

Re: Running Recommender samples

2010-10-26 Thread Sean Owen
All you should need is "-i " there. I'll re-add this info which was lost in my last edit. On Tue, Oct 26, 2010 at 8:47 PM, Robert Stewart wrote: > I'm having trouble running the recommender samples. I installed and built > mahout with the following comments, on Mac OS X. > >

Re: Anonymous user and CachingRecommender

2010-10-27 Thread Sean Owen
That's fine, though user@ is also fine since that way other users see the question. Yes that's actually important, as you don't want it to cache the recommendation for the anonymous user, since it could then be reused for another totally different user. Your proposal would fix it, though kinda co

Re: Anonymous user and CachingRecommender

2010-10-27 Thread Sean Owen
ame (the ones of the first cached user > of that kind, until its cache is removed) and thus wrong for almost all the > anonymous users. > Is that correct? > > > On Wed, Oct 27, 2010 at 10:40 AM, Sean Owen wrote: > > > That's fine, though user@ is also fine since that

Re: Ease of recommendation for a user

2010-10-28 Thread Sean Owen
What's your intuition -- what would you do with this figure? Users with higher variance are easier or harder to recommend well for? I don't know if that directly affects the quality... probably the diversity of quantity of prefs is more directly relevant. On Thu, Oct 28, 2010 at 7:31 AM, Lance Nor

Re: Ease of recommendation for a user

2010-10-28 Thread Sean Owen
ions are probably not going to be great given the sparsity of > > overlap. > > > > Couple of good starting points here on scholar. > > > http://scholar.google.com/scholar?hl=en&client=safari&rls=en&q=recommendation+sparse+data&um=1&ie=UTF-8&sa=N&tab=ws > >

Re: Why can't i train using the entire dataset while RMSE evaluation?

2010-10-29 Thread Sean Owen
It's true that the recommenders will give you a score of 0 when using 100% of the input for training for the reasons given. That should be the case: it doesn't need to estimate any answers, it knows them already. But yes I see your question now. No there is not a direct way to do it, but, I think

Re: Why can't i train using the entire dataset while RMSE evaluation?

2010-10-30 Thread Sean Owen
You're right, this implementation is exceptional. It does not check to see if it already "knows the answer" and return a known preference. I'd regard it as a small deficiency. On Fri, Oct 29, 2010 at 8:40 PM, Sanjib Kumar Das wrote: > No it won't give an RMSE of 0. > >

Re: Unifying different recommendations

2010-10-30 Thread Sean Owen
That's an interesting idea, would be curious to hear your results. No there's no particular support for that already, you'd have to roll your own. On Fri, Oct 29, 2010 at 9:21 PM, Steven Bourke wrote: > Hi - > > I've written a recommender algorithm in mahout that unifies recommendations > made i

Re: Exploiting the last visited items

2010-10-30 Thread Sean Owen
Recent item associations aren't any different than others -- if you want to use them as data, they need to go in the DataModel. And then recommendations need to be re-computed to take them into account. That's the simplistic answer. In practice this can be inefficient for some algorithms. Slope on

0.4 released

2010-10-31 Thread Sean Owen
We're pleased to announce we've finally completed the 0.4 release. It will begin showing up on mirrors shortly, so check back if you can't find it just yet from the usual spot: http://www.apache.org/dyn/closer.cgi/mahout/ The complete news item is as follows: We are pleased to announce release 0.

Re: Exploiting the last visited items

2010-11-01 Thread Sean Owen
ince it is computed for > every item. For me it seems reasonable, to use my standard recommender > besides the anonymous user recommender for recent items and combine > their results. Or are there any better ideas? > > 2010/10/30 Sean Owen : > > Recent item associations aren

Re: Problem encountered while running RecommenderJob

2010-11-06 Thread Sean Owen
I'm not sure what the context is -- GroupLensRecommender has nothing to do with Hadoop. So I'm not surprised this doesn't work. On Fri, Nov 5, 2010 at 8:57 PM, Sanjib Kumar Das wrote: > Hi all, > > I am trying to run the GroupLensRecommender on hadoop. > I am using the pseudo.RecommenderJob for t

Re: Problem encountered while running RecommenderJob

2010-11-06 Thread Sean Owen
ender > --numRecommendations 10 > > > On Sat, Nov 6, 2010 at 4:55 AM, Sean Owen wrote: > > > I'm not sure what the context is -- GroupLensRecommender has nothing to > do > > with Hadoop. So I'm not surprised this doesn't work. > > > > On Fr

Re: Jetty Http error 404

2010-11-07 Thread Sean Owen
 1 >   > >   >    axis >    Apache-Axis Servlet > >  org.apache.axis.transport.http.AxisServlet >   > >   >    taste-recommender >    /RecommenderServlet >   >   >    axis >    *.jws >   > >   > >   >    5 >   > >   >   >    wsdl >  

Re: Usage of MySQL as DataSource with connection pooling

2010-11-08 Thread Sean Owen
On Mon, Nov 8, 2010 at 9:29 PM, Karl Eigengrund wrote: > I am new to Mahout. I am using it to get recommendations out of my data > stored in a MySQL database. I have started by configuring the DataSource > programmatically, and followed the performace hints in the JavaDoc for > MySQLJDBCDataMod

Re: Usage of MySQL as DataSource with connection pooling

2010-11-09 Thread Sean Owen
This really isn't a question about Mahout itself, but I believe I can still help. I don't see anywhere that you've actually configured the DataSource! You need something like this in server.xml; this is my standard "recipe":

Re: Usage of MySQL as DataSource with connection pooling

2010-11-09 Thread Sean Owen
I take it back -- I think the way you are configuring it is also valid. However the problem is that for whatever reason it is not configuring and deploying JNDI correctly. You should look in the server logs for a clue to the problem. On Tue, Nov 9, 2010 at 11:03 AM, Sean Owen wrote: > T

Re: Usage of MySQL as DataSource with connection pooling

2010-11-09 Thread Sean Owen
It is not being configured -- class and URL are null. You might try my style of configuring it as I know it works. On Tue, Nov 9, 2010 at 2:46 PM, Karl Eigengrund wrote: > Thank you for your help!! > I have tried it as you suggested and also on a different system (Ubuntu > instead of Windows), b

Re: Mahout - Help needed - files with no preferences and integarting mahout with Hadoop

2010-11-12 Thread Sean Owen
I think you'd have to debug to get more insight. To me it looks OK. Do you have enough data? Maybe your user has few or no prefs, which means nothing can be recommended. On Fri, Nov 12, 2010 at 12:02 PM, bejoy ks wrote: > > Hi Steven > I tried my User Similarity recommendation using > GenericB

Re: Mahout - Help needed - files with no preferences and integarting mahout with Hadoop

2010-11-12 Thread Sean Owen
Yes, if you have no data for a user, you can't make recommendations. If you have a little data, you can make only a few, if any, weak recommendations. The framework won't return very weak recommendations. For RecommenderJob -- just read the javadoc, which will tell you how to run it. I'll also po

Re: Mahout - Help needed - files with no preferences and integarting mahout with Hadoop

2010-11-12 Thread Sean Owen
http://manning.com/owen On Fri, Nov 12, 2010 at 3:16 PM, bejoy ks wrote: > > Ok that'd be great Owen , if you could point me to the book 'Mahout in > Action' . I'm bit interested to know more on the possibilities available > with mahout and also the right usage of similarities, recommenders

Re: Mahout - Help needed - files with no preferences and integarting mahout with Hadoop

2010-11-12 Thread Sean Owen
It looks reasonable to me; what works best for your data will only be revealed with some testing of different algorithms. For example try LogLikelihoodSimilarity. On Fri, Nov 12, 2010 at 3:41 PM, bejoy ks wrote: > > Thanks a lot Owen. One more small favor,hope its fine for you. > I'd like to ge

Re: FileDataModel question: loading incremental files

2010-11-15 Thread Sean Owen
Yes I think you could make this change -- skip update files unless the modified date is after the *latest* of all the data file's and update files' last-modified date. Would you be interested in trying out a change like this locally to verify it works and posting the patch? I think it's just a few

Re: Standard way of getting precision for specific users?

2010-11-16 Thread Sean Owen
No, there's no way to force the test data that is held out in these tests, but, it's pretty simple to modify the code to hold out whatever test data you want. In this way you could conduct a test over just particular users. On Wed, Nov 17, 2010 at 12:18 AM, Steven Bourke wrote: > Hi - I'd like t

Re: Can Mahout make recommendations while the recommender is being refreshed?

2010-11-19 Thread Sean Owen
In short -- you're mostly right. There's a tradeoff between fill-and-swap (no service interruption, but needs 2x memory), and a stop-the-world approach. FileDataModel usually does an incremental update but will fill-and-swap as you call it when the main file is updated. SlopeOneRecommender does a

Re: Need for a distributed SVDRecommender

2010-11-19 Thread Sean Owen
That result sounds confusing. It should take about the same number of wall-clock hours either way. I don't see why it would take 14 hours -- that sounds wrong. If anything it should take 38 / N minutes where N is the number of recommenders you ran. SVDRecommender is not distributed at all, no. On

Re: Need for a distributed SVDRecommender

2010-11-20 Thread Sean Owen
y different from pseudo.RecommenderJob (in terms > of the distributed implementation) hence the difference in timings, i > guess. > > > On Fri, Nov 19, 2010 at 4:04 PM, Sean Owen wrote: > > > That result sounds confusing. It should take about the same number of > > wall-clock

Re: Need for a distributed SVDRecommender

2010-11-20 Thread Sean Owen
y different from pseudo.RecommenderJob (in terms > of the distributed implementation) hence the difference in timings, i > guess. > > > On Fri, Nov 19, 2010 at 4:04 PM, Sean Owen wrote: > > > That result sounds confusing. It should take about the same number of > > wall-clock

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-20 Thread Sean Owen
It's the exact same process -- what does "doesn't work" mean? what error? The process of converting the data to CSV is of course entirely different. You would not apply that part to such a different input. Just use a text processing tool to convert GroupLens's file to replace "::" with "," and rem

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-20 Thread Sean Owen
t, Nov 20, 2010 at 10:14 AM, Stefano Bellasio < stefanobella...@gmail.com> wrote: > Thanks for the answer :) Well right now i have a ratings.dat file, what i > have to do? convert it as you said in CSV with :: instead of , ? Thanks > Il giorno 20/nov/2010, alle ore 11.12, Sean Owen ha

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-20 Thread Sean Owen
Sorry but that's something to do with Hadoop, not Mahout. It seems like an error in your HDFS cluster. On Sat, Nov 20, 2010 at 10:41 AM, Stefano Bellasio < stefanobella...@gmail.com> wrote: > Ok Sean, thank you, works very well :) Now when i run hadoop with that > before finish it say this: > >

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-20 Thread Sean Owen
in/hadoop jar core/target/mahout-core-0.5-SNAPSHOT.job.jar > org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob -i input/ratings.txt > -o data/ratings4 --recommenderClassName > org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender > > Thanks again > > Il

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-20 Thread Sean Owen
Look at Listing 5.5. This is the kind of wrapper you need. You can skip the no-arg constructor actually and just write the one that takes a DataModel, if I recall correctly. Just delegate the rest of the methods. Inside the constructor, build whatever recommender you want. On Sat, Nov 20, 2010 at

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-20 Thread Sean Owen
The first part looks right, but you have not delegated any methods. Please see the example I referred you to. On Sat, Nov 20, 2010 at 3:06 PM, Stefano Bellasio wrote: > Something like this? And then i have to package it and use instead of > Mahout-core..job right? Thanks again > >

Re: Can Mahout make recommendations while the recommender is being refreshed?

2010-11-21 Thread Sean Owen
Agree with Ted's solution in practice. Stuff should already be using floats, or have the option to use floats, for this reason. The precision is usually not helpful in this context. On Sun, Nov 21, 2010 at 4:44 AM, Lance Norskog wrote: > Another option is to change the Double arrays to Floats. >

Re: NullPointerException in MySQL driver when using MySQLJDBCDataModel

2010-11-21 Thread Sean Owen
It is nothing to do with your code, I think. As far as I can tell, it is at least a minor bug in the driver. It should not throw an NPE in any event. If it's being triggered by some wrong usage pattern, I don't know what it is. The code looks fine. You could try the latest MySQL driver, and/or st

Re: NullPointerException in MySQL driver when using MySQLJDBCDataModel

2010-11-21 Thread Sean Owen
Should be fine. INTEGER is a smaller type. You'd get a different error. Well, you could sure try switching the data type just to see but I'd be really surprised if that's it. On Sun, Nov 21, 2010 at 3:29 PM, gustavo salazar < guga.salazar.l...@gmail.com> wrote: > Maybe the problem is the mysql ty

Re: Interpreting the output of SVD

2010-11-22 Thread Sean Owen
Are you asking what the left and right vectors mean in general in the SVD? S is a re-expression of the original matrix's transformation, but in a different and more natural basis. (Actually it's an approximation, since small singular values are tossed out, and the rank of S is therefore much small

Re: Interpreting the output of SVD

2010-11-22 Thread Sean Owen
e "scaling factors" ... but I actually struggle to come up with a good intuitive explanation of what S itself is (or really, U and V by themselves). Anyone smarter have a nice pithy analogy? On Mon, Nov 22, 2010 at 11:06 AM, Sean Owen wrote: > > In more CF-oriented terms, S is an

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-22 Thread Sean Owen
gt; > problem I encountered (like exceptions of yours) was packing the libs, > > creating jar and configuring input paths. As Sean said, look at > > GroupLensRecommender implementation in > > o.a.m.c.t.example.grouplens.GroupLensRecommender. > > > > HTH > > >

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-22 Thread Sean Owen
Mahout in Action (We now interrupt for an ad break: http://manning.com/owen) On Mon, Nov 22, 2010 at 7:17 PM, Thomas De Vos wrote: > Sean, > > Which book are you referring to? > > Thanks > > Thomas

Re: Grouplens dataset Recommenderjob with Hadoop

2010-11-22 Thread Sean Owen
e-0.5-SNAPSHOT.job.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i input/ratings.txt -o > data/output --recommenderClassName > org.apache.mahout.cf.taste.hadoop.item.ItemBased > > thanks > > Il giorno 22/nov/2010, alle ore 20.22, Sean Owen ha scritto: > >>

Re: Matrix-based recommendation analysis

2010-11-22 Thread Sean Owen
(PS I don't think that link from Ted is publicly visible but try http://www.slideshare.net/tdunning ) Maybe I'm walking into half of a another conversation but what's the question or goal here? I don't think the matrix product contains quite what you're saying. For example U1 records only 2 ratin

Re: Playing with the last.fm dataset

2010-11-23 Thread Sean Owen
Yes I think the logarithm is a fine choice. The base doesn't matter as the scale of ratings doesn't make a difference. On Tue, Nov 23, 2010 at 2:07 PM, Sebastian Schelter wrote: > Hi, > > I'm currently looking into the last.fm dataset (from > http://denoiserthebetter.posterous.com/music-recommend

Re: error in itemsimilarity

2010-11-26 Thread Sean Owen
It says it right there -- text files *with the preference data*. This is a collaborative filtering tool, which is quite different from computing document similarity. On Fri, Nov 26, 2010 at 8:25 AM, Divya wrote: > Hi, > > But in java doc of ItemSimilarityJob its written that > "Dmapred.input.di

Re: RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation

2010-11-26 Thread Sean Owen
This is because all the ratings are implicitly 1.0 when there are no ratings. But I actually think this is symptomatic of a problem, since I note that those recommendations are quite suspiciously in order by item ID. I am not sure the current state of the distributed recommender is compatible with

Re: RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation

2010-11-26 Thread Sean Owen
But is it then ranking the recommendations by the estimated pref? If it's always 1, then the ordering is not meaningful. Maybe it is, I just haven't looked at your changes in much detail since you made them although it looked broadly correct and proper. On Fri, Nov 26, 2010 at 6:33 PM, Sebastian

Re: RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation

2010-11-26 Thread Sean Owen
after that, with n being the number > specified in the parameter --numRecommendations given to RecommenderJob. > > Can you point me to the code where the non-distributed code handles the > problem of ranking them? We could certainly emulate that behaviour in > the distributed code too. > &

Re: is it necessary set mapred.map.tasks running mahout on a cluster?

2010-11-26 Thread Sean Owen
I tend to let the cluster decide these things based on the input size and splits. But yes if you're not getting enough CPU utilization you can try running more mappers. If you're I/O bound, it won't necessarily help, but if not, it should increase throughput. On Fri, Nov 26, 2010 at 10:45 PM, rmx

Re: grouplens example build in eclipse

2010-11-30 Thread Sean Owen
Yes that page is years old, and very out of date. As the "UPDATE" message says there you need to refer to Mahout. Try: https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation On Tue, Nov 30, 2010 at 5:58 PM, Alessandro Binhara wrote: > hello all .. > > I'm trying to compile t

Re: Group lens example

2010-12-02 Thread Sean Owen
I'm not sure what file you are feeding it but it is not the right data file. Check your input. On Thu, Dec 2, 2010 at 6:43 PM, Alessandro Binhara wrote: > Hello all .. > > I have a sucess to deploy grouplens example in tomcat.. > > but..i got this error : > > I checked the data file and it is not

Re: Group lens example

2010-12-02 Thread Sean Owen
That doesn't sound right to me. A million ratings is very small. I don't know that this value is actually taking hold for you if you're runnign out of memory. You would want to debug to see what the heap size really is. On Thu, Dec 2, 2010 at 8:36 PM, Alessandro Binhara wrote: > ok.. > > i try a

Re: Group lens example

2010-12-03 Thread Sean Owen
Yes it reads from files as a stream, whether compressed or not. On Fri, Dec 3, 2010 at 4:41 AM, Lance Norskog wrote: > Should the .gz file unpack in streaming mode? Does it unpack > everything first? Or is it just reading the .gz file as a text file? > =

Re: Release notes for 0.4

2010-12-03 Thread Sean Owen
I'll fix that, thanks. On Fri, Dec 3, 2010 at 4:46 AM, Lance Norskog wrote: > The link to the 0.4 release notes on http://mahout.apache.org/ > unfortunately point to the 0.3 release notes: > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310751&styleName=Html&version=123142

Re: RecommenderJob - ArrayIndexOutOfBoundsException

2010-12-10 Thread Sean Owen
Yes it is a problem with the input -- would be helpful to see (part of) it. On Fri, Dec 10, 2010 at 3:50 PM, Niall Riddell wrote: > Hi, > > I've been studiously working through Mahout In Action and I'm currently > trying to execute the RecommenderJob on my local Hadoop instance. > > Hadoop is up

Re: RecommenderJob - ArrayIndexOutOfBoundsException

2010-12-10 Thread Sean Owen
Yes it needs to be in "user,item[,rating]" format instead to use the regular implementation. See the discussion in 6.3.2. The listing under it shows a different Mapper called WikipediaToItemPrefsMapper, which will read this input format though. You can swap that in. Or you can externally convert th

<    8   9   10   11   12   13   14   15   16   >