Re: MongoDataModel

2011-05-17 Thread Sean Owen
There's no web app in here, it's just code that imports MongoDB classes. Yes it can be made 'provider' scope; it still means a lesser overhead of downloading the dependency. (kfs has been removed BTW.) I think we have the answer of farming this out to an 'integration' module that was have already u

[jira] [Updated] (MAHOUT-696) Command line program for AdaptiveLogiscticRegression

2011-05-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-696: - Fix Version/s: (was: 0.5) 0.6 > Command line program for AdaptiveLogiscticRegressi

Re: Possible contributions

2011-05-17 Thread Hector Yee
Re: boosting scalability, I've implemented it on thousands of machines, but not with mapreduce, rather with direct RPC calls. The gradient computation tends to be iterative, so one way to do it is to have each iteration run per mapreduce. Compute gradients in the mapper, gather them in the reducer,

[jira] [Commented] (MAHOUT-696) Command line program for AdaptiveLogiscticRegression

2011-05-17 Thread XiaoboGu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035195#comment-13035195 ] XiaoboGu commented on MAHOUT-696: - I did not touch any other java source files, only creat

[jira] [Commented] (MAHOUT-696) Command line program for AdaptiveLogiscticRegression

2011-05-17 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035189#comment-13035189 ] Ted Dunning commented on MAHOUT-696: Attach a file containing your patch to this JIRA.

[jira] [Updated] (MAHOUT-696) Command line program for AdaptiveLogiscticRegression

2011-05-17 Thread XiaoboGu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaoboGu updated MAHOUT-696: Status: Open (was: Patch Available) > Command line program for AdaptiveLogiscticRegression > -

[jira] [Updated] (MAHOUT-696) Command line program for AdaptiveLogiscticRegression

2011-05-17 Thread XiaoboGu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaoboGu updated MAHOUT-696: Fix Version/s: 0.5 Status: Patch Available (was: Open) > Command line program for AdaptiveLogis

Re: [jira] [Commented] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Ted Dunning
This is a cute idea. Discount old data and revert to the prior. Should be very straightforward. I don't know of a use off-hand, but I will keep an eye out for it. On Tue, May 17, 2011 at 11:33 AM, Dmitriy Lyubimov (JIRA) wrote: > Also i experimented with yet another biased estimator for binomi

Re: [jira] [Commented] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Ted Dunning
The current implementation should allow updates to the past, but it will only ever give you an average at the latest data point. On Tue, May 17, 2011 at 11:33 AM, Dmitriy Lyubimov (JIRA) wrote: > I am also using this with slight modifications to enable to use with > map-reduce. 2 suggestions i im

Re: Possible contributions

2011-05-17 Thread Ted Dunning
On Tue, May 17, 2011 at 5:26 PM, Hector Yee wrote: > I have some proposed contributions and I wonder if they will be useful in > Mahout (otherwise I will just commit it in a new open source project in > github). > These generally sound pretty good. > - Sparse autoencoder (think of it as somet

Re: MongoDataModel

2011-05-17 Thread Ted Dunning
I don't see a problem with an extra module if the webapp can somehow resolve the reference at run-time. And if the artifacts are in Maven, how bad is it to include them as dependencies? If these could be made optional dependencies with "provided" scope, then anybody who doesn't use them wouldn't

Build failed in Jenkins: Mahout-Quality #820

2011-05-17 Thread Apache Jenkins Server
See -- [...truncated 1582 lines...] A examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/canopy A examples/src/main/java/org/apache/mahout/clustering/syntheticcont

Possible contributions

2011-05-17 Thread Hector Yee
Hello, Some background on myself - I was at Google the last 5 years working on the self-driving car, image search and youtube in machine learning ( http://www.linkedin.com/in/yeehector) I have some proposed contributions and I wonder if they will be useful in Mahout (otherwise I will just com

Build failed in Jenkins: Mahout-Quality #819

2011-05-17 Thread Apache Jenkins Server
See Changes: [srowen] [maven-release-plugin] prepare for next development iteration [srowen] [maven-release-plugin] prepare release mahout-0.5 -- [...truncated 1581 lines...] A exa

Re: MongoDataModel

2011-05-17 Thread Sean Owen
Weird as it sounds, I think the best place is mahout-taste-webapp. Once the module is renamed it'll make more sense. But if you make a patch against that module with the right pom.xml changes it ought to be 99.9% what is needed. On Tue, May 17, 2011 at 10:23 PM, Fernando Tapia Rico wrote: > yep,

Re: MongoDataModel

2011-05-17 Thread Fernando Tapia Rico
yep, I completely understand your concerns. So...What should I do? cos I guess that I need to know where to place this to open the JIRA ticket. I don't have any rush, I can wait until you guys decide what is the best option for this DataModel. On Tuesday, May 17, 2011, Sean Owen wrote: > I'm talk

Re: MongoDataModel

2011-05-17 Thread Sean Owen
I'm talking about the DataModel -- there isn't a Recommender, is there? (Haven't looked at the code.) No, I mean that we don't want to introduce a dependency on MongoDB across the whole project just for this, but it could fit into an optional sub-module. On Tue, May 17, 2011 at 10:10 PM, Fernando

Re: MongoDataModel

2011-05-17 Thread Fernando Tapia Rico
Thanks for the good responses. I will open a JIRA ticket, but...reading the thread...where should I put this DataModel? And, Sean, when you say "I would commit this to mahout-examples/ in a heartbeat except that I'm wondering about the issue of dependencies", are you talking about the Recommender o

Re: List your changes for Mahout 0.5

2011-05-17 Thread Sebastian Schelter
I'd also suggest to mark the distributed ALS code as experimental, because we have not yeen seen good results yet, probably because of the iterative nature of the algorithm. Am 17.05.2011 21:37 schrieb "Dmitriy Lyubimov" : > Our company just had a new hire whose tasks would include a somewhat > siz

Re: MongoDataModel

2011-05-17 Thread Sean Owen
It probably belongs in a renamed "mahout-taste-webapp" which is where we put the one similar thing in the project already. That can be "mahout-integration" or something. (Separately I'm conscious of an explosion of Maven modules... this isn't adding a new one, but, getting to be a lot of them.) O

Re: MongoDataModel

2011-05-17 Thread Grant Ingersoll
Hmm, the license is AGPL, but then the drivers are ASL for MongoDB. What does this patch require? I suspect we will have other DataModels. I have heard interest in adding a Cassandra DataModel as well. Not sure if they belong in examples, but also agree they aren't "core" per se. Thoughts

[jira] [Issue Comment Edited] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034947#comment-13034947 ] Dmitriy Lyubimov edited comment on MAHOUT-634 at 5/17/11 7:41 PM: --

Re: SF Informal meetup on May 23?

2011-05-17 Thread Dmitriy Lyubimov
I'd be interested to come. Being local, any day pretty much works for me. On Sun, May 15, 2011 at 2:42 AM, Dawid Weiss wrote: > Thursday would work for me too. > > Dawid > > On Sun, May 15, 2011 at 1:31 AM, Ted Dunning wrote: >> Works for me. >> >> On Sat, May 14, 2011 at 3:30 PM, Jake Mannix w

Re: List your changes for Mahout 0.5

2011-05-17 Thread Dmitriy Lyubimov
Our company just had a new hire whose tasks would include a somewhat sizeable inputs on SSVDs so hopefully we'll get more data soon and validation soon. On Tue, May 17, 2011 at 12:34 PM, Dmitriy Lyubimov wrote: > We still haven't got a good size test confirmation for SSVD (at least > Mahout's ver

Re: List your changes for Mahout 0.5

2011-05-17 Thread Dmitriy Lyubimov
We still haven't got a good size test confirmation for SSVD (at least Mahout's version of it) so i'd put experimental and expect hotfixes. On Tue, May 17, 2011 at 7:50 AM, Sean Owen wrote: >        •       Improved Lanczos solver >        •       Stochastic Singular Value Decomposition implementa

RE: List your changes for Mahout 0.5

2011-05-17 Thread Jeff Eastman
If you look at the ClusterClassifier and ClusterIterator classes, you can see there is not a lot of code in either and they read pretty well. The ClusterClassifier is an AbstractVectorClassifier that uses the pre-existing Cluster classes as its models. It should be usable as a classifier, just l

[jira] [Issue Comment Edited] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034945#comment-13034945 ] Dmitriy Lyubimov edited comment on MAHOUT-634 at 5/17/11 6:37 PM: --

Re: MongoDataModel

2011-05-17 Thread Sean Owen
Agree, that's the process, and agree this is cool. I would commit this to mahout-examples/ in a heartbeat except that I'm wondering about the issue of dependencies. It introduces a dependency on MongoDB in Maven just for this. Is that fine for examples, we think? It's not for core. On Tue, May 17

[jira] [Commented] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034945#comment-13034945 ] Dmitriy Lyubimov commented on MAHOUT-634: - Ted, I am also using this with slight

[jira] [Commented] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034946#comment-13034946 ] Ted Dunning commented on MAHOUT-634: Should be pretty much unconditionally stable for

[jira] [Commented] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034947#comment-13034947 ] Dmitriy Lyubimov commented on MAHOUT-634: - Unfortunately, latex server seems to be

Re: MongoDataModel

2011-05-17 Thread Grant Ingersoll
Cool!Please see https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute on how to contribute, as the best way to get this in is via a patch on a JIRA ticket (please make sure you check the box saying you grant license to the ASF). -Grant On May 17, 2011, at 12:49 PM, Fernando

[jira] [Issue Comment Edited] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034875#comment-13034875 ] Lance Norskog edited comment on MAHOUT-634 at 5/17/11 5:12 PM: -

Re: List your changes for Mahout 0.5

2011-05-17 Thread Lance Norskog
Could someone who understands it create a diagram of how the parts fit together? On Tue, May 17, 2011 at 8:57 AM, Jeff Eastman wrote: > Cool. I thought it was pretty slick how the pre-existing parts all fit > together. Perhaps even elegant. > > -Original Message- > From: Ted Dunning [mai

MongoDataModel

2011-05-17 Thread Fernando Tapia Rico
Hi all, I've been using Mahout for a while but I didn't find a DataModel for MongoDB, so I decided to create my own implementation. I didn't find any issue with my implementation, and I found it really helpful, so I would like to share it with the community (and be a Mahout contributor). I'm attach

[jira] [Commented] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034875#comment-13034875 ] Lance Norskog commented on MAHOUT-634: -- Is this numerically stable? Or rather, in whi

Re: Improving release site doc

2011-05-17 Thread Benson Margulies
done. On Tue, May 17, 2011 at 11:48 AM, Benson Margulies wrote: > Can I interest anyone in : > >   >        org.apache.maven.plugins >        maven-changes-plugin >        2.4 >         >          true >           > Type,Key,Summary,Status,Resolution,Assignee >          Type,Key >         >      

Re: Improving release site doc

2011-05-17 Thread Sean Owen
I defer to you. Stick it in and I'll use it shortly for a release. On Tue, May 17, 2011 at 4:48 PM, Benson Margulies wrote: > Can I interest anyone in : > >   >        org.apache.maven.plugins >        maven-changes-plugin >        2.4 >         >          true >           > Type,Key,Summary,Stat

RE: List your changes for Mahout 0.5

2011-05-17 Thread Jeff Eastman
Cool. I thought it was pretty slick how the pre-existing parts all fit together. Perhaps even elegant. -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Tuesday, May 17, 2011 8:50 AM To: dev@mahout.apache.org Subject: Re: List your changes for Mahout 0.5 I would

Re: List your changes for Mahout 0.5

2011-05-17 Thread Ted Dunning
I would say that we mention it. On Tue, May 17, 2011 at 8:44 AM, Jeff Eastman wrote: > I don't know. It does a credible (but still sequential) job of clustering > kmeans, fuzzyk and dirichlet. It is clearly still experimental. You guys be > the judge... > > -Original Message- > From: Ted

Improving release site doc

2011-05-17 Thread Benson Margulies
Can I interest anyone in : org.apache.maven.plugins maven-changes-plugin 2.4 true Type,Key,Summary,Status,Resolution,Assignee Type,Key jira-report

RE: List your changes for Mahout 0.5

2011-05-17 Thread Jeff Eastman
I don't know. It does a credible (but still sequential) job of clustering kmeans, fuzzyk and dirichlet. It is clearly still experimental. You guys be the judge... -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Tuesday, May 17, 2011 8:39 AM To: dev@mahout.apach

Re: List your changes for Mahout 0.5

2011-05-17 Thread Ted Dunning
Is it done enough to claim? On Tue, May 17, 2011 at 8:29 AM, Jeff Eastman wrote: > Perhaps premature, but would you want to include the > clustering-classification convergence e.g. ClusterClassifier? > > -Original Message- > From: Sean Owen [mailto:sro...@gmail.com] > Sent: Tuesday, May

RE: List your changes for Mahout 0.5

2011-05-17 Thread Jeff Eastman
Perhaps premature, but would you want to include the clustering-classification convergence e.g. ClusterClassifier? -Original Message- From: Sean Owen [mailto:sro...@gmail.com] Sent: Tuesday, May 17, 2011 7:51 AM To: Mahout Dev List Subject: List your changes for Mahout 0.5 •

Re: List your changes for Mahout 0.5

2011-05-17 Thread Jake Mannix
LDA document-topic distribution output, and graceful restarts. Lanczos solver improvments are: graceful restarts (don't lose work), and significantly better scaling properties: O(numWords * 3) memory needed, instead of O(numWords * desiredRank * 2). On Tue, May 17, 2011 at 7:50 AM, Sean Owen wro

[jira] [Updated] (MAHOUT-634) Need more online averagers

2011-05-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-634: - Affects Version/s: (was: 0.5) 0.4 Fix Version/s: 0.5 > Need more onlin

[jira] [Updated] (MAHOUT-539) Need example code for fast encoding

2011-05-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-539: - Priority: Minor (was: Major) Affects Version/s: 0.4 Fix Version/s: 0.5

List your changes for Mahout 0.5

2011-05-17 Thread Sean Owen
• Improved Lanczos solver • Stochastic Singular Value Decomposition implementation • Incremental SVD implementation • Alternating Least Squares with Weighted Regularization collaborative filtering implementation, both distributed and non-distr

Re: [VOTE] Code freeze for 0.5 release May 20

2011-05-17 Thread Benson Margulies
I dropped the previous 'rc' from the Apache nexus. On Tue, May 17, 2011 at 10:17 AM, Sean Owen wrote: > I had meant to start releasing as of the 20th for a release next week, > but, why don't I go ahead and do a dry run and call a vote? If anybody > objects, we'll not pass the vote, but otherwise

Re: [VOTE] Code freeze for 0.5 release May 20

2011-05-17 Thread Sean Owen
I had meant to start releasing as of the 20th for a release next week, but, why don't I go ahead and do a dry run and call a vote? If anybody objects, we'll not pass the vote, but otherwise we just go ahead. On Tue, May 17, 2011 at 3:16 PM, Grant Ingersoll wrote: > > On May 15, 2011, at 3:51 PM,

Re: [VOTE] Code freeze for 0.5 release May 20

2011-05-17 Thread Grant Ingersoll
On May 15, 2011, at 3:51 PM, Sean Owen wrote: > With any luck the same release process works; Benson's tried most of > it recently. I suppose you can do a dry run if you're bored to test > the wiki documentation. But I imagine even better would be to try a > few examples and double-check the bit

[jira] [Commented] (MAHOUT-667) Persistent storage of factorizations in SVDRecommender

2011-05-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034766#comment-13034766 ] Hudson commented on MAHOUT-667: --- Integrated in Mahout-Quality #816 (See [https://builds.apa

[jira] [Commented] (MAHOUT-667) Persistent storage of factorizations in SVDRecommender

2011-05-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034707#comment-13034707 ] Sean Owen commented on MAHOUT-667: -- Since it's a clear bug fix to both of us, and relativ

[jira] [Commented] (MAHOUT-667) Persistent storage of factorizations in SVDRecommender

2011-05-17 Thread Chris Newell (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034705#comment-13034705 ] Chris Newell commented on MAHOUT-667: - Found a bug in AbstractFactorizer, which I intr