Re: O.6 and Re: Build failed in Jenkins: Mahout-Examples-Cluster-Reuters #15

2012-01-16 Thread Ted Dunning
The Mahout-792 change should have had no effect on the examples stuff. On Tue, Jan 17, 2012 at 12:54 AM, Jeff Eastman wrote: > Does anybody know what changed to break this example? Jenkins really > needs to be stable in order to code freeze for 0.6 and, IMHO, we should be > focusing our efforts

[jira] [Updated] (MAHOUT-890) Performance issue in FPGrowth

2012-01-16 Thread Robin Anil (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-890: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Performance issue in FPGrowth

Re: Minhash review

2012-01-16 Thread Suneel Marthi
Lance, I don't think this problem is confined to DisplayMinHash alone, even the regular MinHash clustering doesn't seem right when run on the Reuter's dataset (using cluster-reuters.sh) and a few other data sets I had tried.  I am playing with the the keyGroups values to determine if that impro

[jira] [Issue Comment Edited] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-16 Thread Lance Norskog (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187369#comment-13187369 ] Lance Norskog edited comment on MAHOUT-946 at 1/17/12 1:47 AM: -

Re: Minhash review

2012-01-16 Thread Lance Norskog
Minhash works better and better with the more dimensions you throw at it, right? All of the Display classes use two dimensions. Would MinHash more sense if it uses a few hundred dimensions and then collapse down to two? Maybe with SVD? Are there other clustering algorithms that have this problem?

[jira] [Commented] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-16 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187369#comment-13187369 ] Lance Norskog commented on MAHOUT-946: -- Yup, you're right. Shell script should be a s

Jenkins build is still unstable: Mahout-Quality #1311

2012-01-16 Thread Apache Jenkins Server
See

O.6 and Re: Build failed in Jenkins: Mahout-Examples-Cluster-Reuters #15

2012-01-16 Thread Jeff Eastman
Does anybody know what changed to break this example? Jenkins really needs to be stable in order to code freeze for 0.6 and, IMHO, we should be focusing our efforts to achieve this goal along with completion of the following 3 open issues: ASF JIRA

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

2012-01-16 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187291#comment-13187291 ] Lance Norskog commented on MAHOUT-947: -- mahout/src/conf/driver.classes.props lists al

[jira] [Commented] (MAHOUT-946) Map-reduce job status often left unchecked

2012-01-16 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187264#comment-13187264 ] tom pierce commented on MAHOUT-946: --- Making sure the examples halt at appropriate places

Re: [jira] [Commented] (MAHOUT-737) Implicit Alternating Least Squares SVD

2012-01-16 Thread Dmitriy Lyubimov
btw I think Mahout sparse matrix should be fairly efficient when handling diagonal and triangular or Hessenberg like stuff. On Mon, Jan 16, 2012 at 11:18 AM, Dmitriy Lyubimov wrote: > yes, the UpperTriangular in SSVD doesn't support some of the Matrix > operations. On top of it, it is built with

[jira] [Commented] (MAHOUT-780) job jars fail on OS X due to case-insensitive name conflict on 'license'

2012-01-16 Thread William McNeill (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187189#comment-13187189 ] William McNeill commented on MAHOUT-780: Take a look at the Maven shade plugin. I

[jira] [Updated] (MAHOUT-947) Improvements to seqdumper

2012-01-16 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-947: -- Attachment: MAHOUT-947-2.patch Adjusted to put vector options in VectorDumper. Also add ability to dum

FYI: mahout-git config in apache review board has been created.

2012-01-16 Thread Dmitriy Lyubimov
I will try it out shortly. -d

[jira] [Updated] (MAHOUT-737) Implicit Alternating Least Squares SVD

2012-01-16 Thread Tamas Jambor (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Jambor updated MAHOUT-737: Attachment: MAHOUT-737-3.patch > Implicit Alternating Least Squares SVD > -

[jira] [Updated] (MAHOUT-737) Implicit Alternating Least Squares SVD

2012-01-16 Thread Tamas Jambor (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamas Jambor updated MAHOUT-737: Attachment: MAHOUT-737-3.patch > Implicit Alternating Least Squares SVD > -

[jira] [Commented] (MAHOUT-737) Implicit Alternating Least Squares SVD

2012-01-16 Thread Tamas Jambor (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187102#comment-13187102 ] Tamas Jambor commented on MAHOUT-737: - I managed to make it work properly now. Changed

Build failed in Jenkins: Mahout-Examples-Cluster-Reuters #15

2012-01-16 Thread Apache Jenkins Server
See Changes: [tdunning] MAHOUT-792 - Forced correct block ordering in out-of-core SVD. Hopefully addresses ubuntu test failures. Also forced file closing. -- [...truncated 6057 l

Re: [jira] [Commented] (MAHOUT-737) Implicit Alternating Least Squares SVD

2012-01-16 Thread Dmitriy Lyubimov
yes, the UpperTriangular in SSVD doesn't support some of the Matrix operations. On top of it, it is built with some assumptions that Mahout support in general should not have, or have an alternative for, for efficiency sake of ssvd: 1) it assumes the matrix is dense. 2) most importantly, it assume

[jira] [Commented] (MAHOUT-945) The variance calculation of Random forest regression tree

2012-01-16 Thread Wang Yue (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187050#comment-13187050 ] Wang Yue commented on MAHOUT-945: - Hi, I would still doubt the correctness of not dividing

Re: [jira] [Commented] (MAHOUT-945) The variance calculation of Random forest regression tree

2012-01-16 Thread IKumasa Mukai
Hi Ted-san. I made a patch using Welford's method which you advised, not Weighted incremental algorithm. And now the duplicate code is being checked to merge with FullRunningAverageAndStdDev. Thanks, 2012/1/16 Ted Dunning : > WHy not just use an OnlineAccumulator?  Why duplicate code? > > On Su

Re: streaming kmeans

2012-01-16 Thread Ted Dunning
The use of the term facilities is very confusing. I would prefer we talk about centroids or the Cluster data structure. Essentially what fastKM does is to pre-cluster the data into a large set of centroids. This is plausibly done in parallel. Then fastKM does conventional k-means on this large

Re: streaming kmeans

2012-01-16 Thread Federico Castanedo
Ted, thanks for your comments. One difference that i see with this technique (fastkm) and current kmeans clustering implementation in Mahout: at the end, fastkm provides the set of K points based on the cost minimization of the k employed facilities (with k>>K) but current o.a.m.c.kmeans provides

[jira] [Commented] (MAHOUT-947) Improvements to seqdumper

2012-01-16 Thread tom pierce (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186935#comment-13186935 ] tom pierce commented on MAHOUT-947: --- Oh nice- I hadn't seen VectorDumper before. Looks

[jira] [Commented] (MAHOUT-945) The variance calculation of Random forest regression tree

2012-01-16 Thread Ikumasa Mukai (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186881#comment-13186881 ] Ikumasa Mukai commented on MAHOUT-945: -- Hi wang-san. Thank you for your comment and s

Jenkins build is still unstable: Mahout-Quality #1310

2012-01-16 Thread Apache Jenkins Server
See