Re: Connection Pooling

2011-07-13 Thread Sean Owen
That's too small to be that slow. There are a bunch of ways this could be slower than it should. The DataSource may not matter. What's important is whether it is actually a pooling DataSource from the container. You may want to check whether it seems to be reusing connections. Table indexes, on us

Re: Connection Pooling

2011-07-13 Thread Vitali Mogilevsky
Hey, I got the same problem, of slowness while using MYSQL data model, after a small research and looking into mysql's query log, revealed that user - user recommendation just floods the database with thousands and thousands of requests. and thats on small database. for now Im dumbping the database

Re: Connection Pooling

2011-07-13 Thread Sean Owen
That's all correct, it reads a lot. But you can avoid a lot of it by using caching wrappers. You also don't need to dump to a file. Use ReloadFromJDBCDataModel. On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky wrote: > Hey, > I got the same problem, of slowness while using MYSQL data model, aft

What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Xiaobo Gu
Hi Because I am a new user, so I will appreciate for a table like this: Algorithm Name Current Status Local-Run Commands Map-Reduce Run Commands Dataset file format LogisticRegression ProductiontrainAdaptiveLogistic... N/A CSV with head

Re: Connection Pooling

2011-07-13 Thread Vitali Mogilevsky
Thanks, will test that On Wed, Jul 13, 2011 at 12:11 PM, Sean Owen wrote: > That's all correct, it reads a lot. But you can avoid a lot of it by using > caching wrappers. > You also don't need to dump to a file. Use ReloadFromJDBCDataModel. > > On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky

Re: What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Ted Dunning
I think your table got kind of hashed up. Can you put your table on a Mahout wiki page? On Wed, Jul 13, 2011 at 7:11 AM, Xiaobo Gu wrote: > Hi > > Because I am a new user, so I will appreciate for a table like this: > > Algorithm Name Current Status Local-Run Commands > Map-Reduce Run

Re: Connection Pooling

2011-07-13 Thread Salil Apte
Awesome, I will give ReloadFromJDBCDataModel a try. How does this particular data model update itself on database changes? Does it just happen periodically and if so, can this rate be change easily? Lastly, will calling clear(userId) on a recommender frequently be bad for performance? I'm assuming

Re: Connection Pooling

2011-07-13 Thread aaron
-Original Message- From: Salil Apte Date: Wed, 13 Jul 2011 10:19:47 To: Reply-To: user@mahout.apache.org Subject: Re: Connection Pooling Awesome, I will give ReloadFromJDBCDataModel a try. How does this particular data model update itself on database changes? Does it just happen period

Re: Connection Pooling

2011-07-13 Thread Sean Owen
Yes it reloads after a configurable interval, or on demand. Clearing the cache for a user ID only means that user's data is recomputed. It's not bad to call this frequently per se... I suppose you want to let it cache as much and for as long as is valid and acceptable to your app. Your bottleneck

Re: What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Sean Owen
The only objection to this is that it is yet another piece of information to be maintained, and there is a strong chance it will not quite be kept up to date. We already have a bit of doc rot in the javadoc itself, and the wiki (which is just par for the course in a volunteer project). I would fin

Re: Connection Pooling

2011-07-13 Thread Salil Apte
Where can the interval be configured? BTW, ReloadFromJDBCDataModel works like a dream so far :) On Wed, Jul 13, 2011 at 10:58 AM, Sean Owen wrote: > Yes it reloads after a configurable interval, or on demand. > > Clearing the cache for a user ID only means that user's data is recomputed. > It's n

Re: Connection Pooling

2011-07-13 Thread Sean Owen
I was mixing this up with another class. It doesn't reload itself. You can call refresh() to do so. On Wed, Jul 13, 2011 at 7:34 PM, Salil Apte wrote: > Where can the interval be configured? BTW, ReloadFromJDBCDataModel > works like a dream so far :) > > On Wed, Jul 13, 2011 at 10:58 AM, Sean Ow

Re: What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Ted Dunning
Agreed. And I would encourage the original poster to put their table onto the wiki and post a JIRA pointing to the classes they would like better javadoc in. It is always easier to respond to specifics than make open-ended wide-spread improvements (Sean is excepted from this generalization). On

similarity metrics?

2011-07-13 Thread Ian Upright
Hello, I'm looking for more similarity metrics, such as Hellinger distance. Wouldn't they be implemented as a subclass of DistributedVectorSimilarity? Does anyone have more implementations? Thanks, Ian

Re: similarity metrics?

2011-07-13 Thread Sean Owen
What's in the project now is all I know about. Yes if you want to use it with the Hadoop-based similarity calculator, that's what you would extend. How do you apply this metric to vectors? On Wed, Jul 13, 2011 at 10:09 PM, Ian Upright wrote: > Hello, > > I'm looking for more similarity metrics,

Re: similarity metrics?

2011-07-13 Thread Ted Dunning
You would have to encode the distributions as vectors. For discrete distributions, I think that this is relatively trivial since you could interpret each vector entry as the probability for an element i of the domain of the distribution. I think that would result in the Hellinger distance [1] bei

Re: similarity metrics?

2011-07-13 Thread Ian Upright
I found this: http://www.utdallas.edu/~herve/Abdi-Distance2007-pretty.pdf Which seems to explain it pretty simply. Seems like these measures should be fairly easy to implement. I could take a stab at it and publish the results. Ian >What's in the project now is all I know about. Yes if you wa

Re: similarity metrics?

2011-07-13 Thread Ted Dunning
If you need this distance, please go for it! The procedure for publishing the results (or the first attempts) is to file a JIRA (see issues.apache.org/jira/browse/MAHOUT ) and attach patches to the JIRA for review or comment. On Wed, Jul 13, 2011 at 2:55 PM, Ian Upright wrote: > I found this: >

Re: similarity metrics?

2011-07-13 Thread Sean Owen
Yes that's it, according to the reference -- or rather, I suppose, construe the vector as encoding a discrete distribution. Element i has probability proportion to the value at i. (It can't have negative values of course.) It would seem to be what you write below, but the square root of the sum of