Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Dmitriy Lyubimov
But it is not a problem reading U or V files, that's indeed what U and V contain. On Wed, Dec 28, 2011 at 11:49 PM, Dmitriy Lyubimov wrote: > U and V look suspect, degenerate (only 4 first columns are nonzero, > the rest of matrices are zeros. > > On Wed, Dec 28, 2011 at 11:44 PM, Dmitriy Lyubimo

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Dmitriy Lyubimov
U and V look suspect, degenerate (only 4 first columns are nonzero, the rest of matrices are zeros. On Wed, Dec 28, 2011 at 11:44 PM, Dmitriy Lyubimov wrote: > Yeah, fails for me on ubuntu without any special environment issues. > Which makes it easier, i can step thru. > > On Wed, Dec 28, 2011 a

Build failed in Jenkins: Mahout-Quality #1278

2011-12-28 Thread Apache Jenkins Server
See Changes: [srowen] MAHOUT-937 make partitioner send to different reducers (as intended it seems) by just using the hash of primary bytes [srowen] MAHOUT-938 Generate javadoc for integration. Fix some javadoc warnings along the way,

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Dmitriy Lyubimov
Yeah, fails for me on ubuntu without any special environment issues. Which makes it easier, i can step thru. On Wed, Dec 28, 2011 at 9:01 PM, Ted Dunning wrote: > What do checksums look like? > > On Wed, Dec 28, 2011 at 6:33 PM, Grant Ingersoll wrote: > >> I commented out the deletion of the dir

[jira] [Updated] (MAHOUT-817) Add PCA options to SSVD code

2011-12-28 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-817: Attachment: MAHOUT-817.patch rebasing on current trunk > Add PCA options t

Re: Welcome Dmitriy Lyubimov to Mahout PMC

2011-12-28 Thread Raphael Cendrillon
Congrats! On 28 Dec, 2011, at 7:24 PM, Dmitriy Lyubimov wrote: > Thank you, Grant. I am happy to continue working on Mahout. > > On Wed, Dec 28, 2011 at 1:47 PM, Grant Ingersoll wrote: >> I'm pleased to announce the Mahout PMC has elected to add Dmitriy to the >> PMC. Dmitriy has been a comm

Re: Welcome Dmitriy Lyubimov to Mahout PMC

2011-12-28 Thread Dmitriy Lyubimov
Thanks, Hector. On Wed, Dec 28, 2011 at 7:41 PM, Hector Yee wrote: > Congrats! > > On Thu, Dec 29, 2011 at 11:24 AM, Dmitriy Lyubimov wrote: > >> Thank you, Grant. I am happy to continue working on Mahout. >> >> On Wed, Dec 28, 2011 at 1:47 PM, Grant Ingersoll >> wrote: >> > I'm pleased to annou

[jira] [Commented] (MAHOUT-937) Collocations Job Partitioner not being configured properly

2011-12-28 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177031#comment-13177031 ] Hudson commented on MAHOUT-937: --- Integrated in Mahout-Quality #1278 (See [https://builds.ap

[jira] [Commented] (MAHOUT-938) add javadoc for code under integration subfold

2011-12-28 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177030#comment-13177030 ] Hudson commented on MAHOUT-938: --- Integrated in Mahout-Quality #1278 (See [https://builds.ap

Re: [jira] [Updated] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2011-12-28 Thread Ted Dunning
Thanks Tom On Wed, Dec 28, 2011 at 10:32 PM, Tom White (Updated) (JIRA) < j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Tom White updated MAHOUT-822: > - > >

[jira] [Updated] (MAHOUT-822) Mahout needs to be made compatible with Hadoop .23 releases

2011-12-28 Thread Tom White (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAHOUT-822: - Attachment: MAHOUT-822.patch Here's a new version of the patch that applies against trunk. As it stands i

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Ted Dunning
What do checksums look like? On Wed, Dec 28, 2011 at 6:33 PM, Grant Ingersoll wrote: > I commented out the deletion of the dir in the tearDown. Not sure if that > looks reasonable or not, but on the surface they look equivalent. > > Here's the contents of the dir on Ubuntu: > -rw-rw-r-- 1 XX

Re: [jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-28 Thread Shannon Quinn
Sorry for replying via the dev list, but I am without Internet access beyond my phone. Yes, unless anyone testing can find issues with this patch (or with the one Grant posted earlier, as mine contains his), it is meant to be committed. Due to the aforementioned lack of Internet, if someone cou

Re: Welcome Dmitriy Lyubimov to Mahout PMC

2011-12-28 Thread Hector Yee
Congrats! On Thu, Dec 29, 2011 at 11:24 AM, Dmitriy Lyubimov wrote: > Thank you, Grant. I am happy to continue working on Mahout. > > On Wed, Dec 28, 2011 at 1:47 PM, Grant Ingersoll > wrote: > > I'm pleased to announce the Mahout PMC has elected to add Dmitriy to the > PMC. Dmitriy has been a

Re: Welcome Dmitriy Lyubimov to Mahout PMC

2011-12-28 Thread Dmitriy Lyubimov
Thank you, Grant. I am happy to continue working on Mahout. On Wed, Dec 28, 2011 at 1:47 PM, Grant Ingersoll wrote: > I'm pleased to announce the Mahout PMC has elected to add Dmitriy to the PMC. >  Dmitriy has been a committer for quite some time now and we are happy to > have him help out on

[jira] [Updated] (MAHOUT-937) Collocations Job Partitioner not being configured properly

2011-12-28 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-937: - Resolution: Fixed Status: Resolved (was: Patch Available) > Collocations Job Partitioner not

[jira] [Resolved] (MAHOUT-938) add javadoc for code under integration subfold

2011-12-28 Thread Sean Owen (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-938. -- Resolution: Fixed Fix Version/s: 0.6 Assignee: Sean Owen Done, plus cleaning up some re

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Grant Ingersoll
I commented out the deletion of the dir in the tearDown. Not sure if that looks reasonable or not, but on the surface they look equivalent. Here's the contents of the dir on Ubuntu: -rw-rw-r-- 1 XX XX 1632612 2011-12-28 21:17 A-0 -rw-rw-r-- 1 XX XX 1632612 2011-12-28 21:1

Re: Maturity level annotations

2011-12-28 Thread Grant Ingersoll
On Dec 28, 2011, at 7:28 PM, Jeff Eastman wrote: > This is something that I'm enthusiastic about investigating right now. I'm > heartened that K-Means seems to scale well in your tests and I think I've > just improved Dirichlet a lot. I suspect we found out why before, at least for Dirichlet,

[jira] [Created] (MAHOUT-938) add javadoc for code under integration subfold

2011-12-28 Thread Yue Guan (Created) (JIRA)
add javadoc for code under integration subfold -- Key: MAHOUT-938 URL: https://issues.apache.org/jira/browse/MAHOUT-938 Project: Mahout Issue Type: Improvement Components: build Affec

Re: Maturity level annotations

2011-12-28 Thread Jeff Eastman
This is something that I'm enthusiastic about investigating right now. I'm heartened that K-Means seems to scale well in your tests and I think I've just improved Dirichlet a lot. I'd like to test it again with your data. FuzzyK is problematic as its clusters always end up with dense vectors fo

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Ted Dunning
Yeah.. but this is a difference from the correct answer. I am moderately sure that this is a problem writing to the temp directory. On Wed, Dec 28, 2011 at 3:45 PM, Grant Ingersoll wrote: > It's expecting the answer to be 0, but it's some really large value. > testSingularValues(org.apache.mahou

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Grant Ingersoll
It's expecting the answer to be 0, but it's some really large value. testSingularValues(org.apache.mahout.math.ssvd.SequentialOutOfCoreSvdTest): expected:<0.0> but was:<4131200.37> On Dec 28, 2011, at 6:30 PM, Ted Dunning wrote: > I think that the answer is 0 because the model is not be

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Grant Ingersoll
Fails on Ubuntu, but passes on my Mac. On Dec 28, 2011, at 6:21 PM, Grant Ingersoll wrote: > I can reproduce outside of Jenkins. It really seems odd that the answer is > off by so much. > > On Dec 28, 2011, at 2:15 AM, Dmitriy Lyubimov wrote: > >> I vaguely remember Jenkins had problems with

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Ted Dunning
I think that the answer is 0 because the model is not being read and we are swallowing an exception somewhere. This is what an uninitialized matrix would give as a result. On Wed, Dec 28, 2011 at 3:21 PM, Grant Ingersoll wrote: > I can reproduce outside of Jenkins. It really seems odd that the

[jira] [Updated] (MAHOUT-904) SplitInput should support randomizing the input

2011-12-28 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-904: --- Attachment: MAHOUT-904.patch upping the cardinality seems to fix things. Now need to try w/

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Grant Ingersoll
I can reproduce outside of Jenkins. It really seems odd that the answer is off by so much. On Dec 28, 2011, at 2:15 AM, Dmitriy Lyubimov wrote: > I vaguely remember Jenkins had problems with creating stuff in Java tmp > dir. E.g. I remember that was creating problems for Mr tasks in local mr >

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-12-28 Thread Jeff Eastman (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176876#comment-13176876 ] Jeff Eastman commented on MAHOUT-524: - Shannon, is this patch ready to commit? I've in

Re: Mahout sequence file format

2011-12-28 Thread Grant Ingersoll
On Dec 28, 2011, at 7:17 AM, rahul raghavendhra wrote: > I am new to Mahout.. i just want to know how text file is converted into > seqfile and then to sparse vectors.. There are quite a few steps. I would recommend checking out the code and walking through it. See the SparseVectorsFromSequen

Welcome Dmitriy Lyubimov to Mahout PMC

2011-12-28 Thread Grant Ingersoll
I'm pleased to announce the Mahout PMC has elected to add Dmitriy to the PMC. Dmitriy has been a committer for quite some time now and we are happy to have him help out on the PMC. Congrats, Dmitriy! -Grant

Re: Maturity level annotations

2011-12-28 Thread Grant Ingersoll
On Dec 28, 2011, at 1:47 PM, Ted Dunning wrote: > I have nearly given up on getting publicly available large data sets and > have started to specify synthetic datasets for development projects. The > key is to build reasonably realistic generation algorithms and for that > there are always some

Re: Maturity level annotations

2011-12-28 Thread Lance Norskog
Or you can take a small set of good data and generate variations to get a big set with the same disribution curves. On Wed, Dec 28, 2011 at 10:47 AM, Ted Dunning wrote: > I have nearly given up on getting publicly available large data sets and > have started to specify synthetic datasets for deve

Re: Maturity level annotations

2011-12-28 Thread Ted Dunning
I have nearly given up on getting publicly available large data sets and have started to specify synthetic datasets for development projects. The key is to build reasonably realistic generation algorithms and for that there are always some serious difficulties. For simple scaling tests, however,

Re: Maturity level annotations

2011-12-28 Thread Grant Ingersoll
To me, the big thing we continue to be missing is the ability for those of us working on the project to reliably test the algorithms at scale. For instance, I've seen hints of several places where our clustering algorithms don't appear to scale very well (which are all M/R -- K-Means does scale

[jira] [Commented] (MAHOUT-931) Implement a pluggable outlier removal capability for cluster classifiers

2011-12-28 Thread Jeff Eastman (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176707#comment-13176707 ] Jeff Eastman commented on MAHOUT-931: - - 929: Yes, use the existing ClusterClassifier

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-28 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176697#comment-13176697 ] Sean Owen commented on MAHOUT-906: -- OK. I'm ready to commit the hook, with minor changes.

[jira] [Commented] (MAHOUT-937) Collocations Job Partitioner not being configured properly

2011-12-28 Thread Mat Kelcey (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176691#comment-13176691 ] Mat Kelcey commented on MAHOUT-937: --- I should have checked WritableComparator.hashBytes

[jira] [Updated] (MAHOUT-937) Collocations Job Partitioner not being configured properly

2011-12-28 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-937: - Fix Version/s: 0.6 Assignee: Sean Owen Affects Version/s: 0.5 Status:

[jira] [Updated] (MAHOUT-937) Collocations Job Partitioner not being configured properly

2011-12-28 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-937: - Attachment: MAHOUT-937.patch > Collocations Job Partitioner not being configured properly > -

RE: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Gavin McDonald
> -Original Message- > From: Jeff Eastman [mailto:j...@windwardsolutions.com] > Sent: Wednesday, 28 December 2011 3:31 PM > To: dev@mahout.apache.org; infrastruct...@apache.org > Subject: Re: Build failed in Jenkins: mahout-nightly #737 > > +infra@ All build related questions should go

Re: Build failed in Jenkins: mahout-nightly #737

2011-12-28 Thread Grant Ingersoll
I'm seeing this pretty consistently on my Jenkins locally, but then when I run on my Mac, it passes. On Dec 27, 2011, at 10:34 PM, Jeff Eastman wrote: > I'm getting a lot of these emails yet all the tests run locally for me. Does > anybody have an idea what the problem is? This close to a relea

Mahout sequence file format

2011-12-28 Thread rahul raghavendhra
I am new to Mahout.. i just want to know how text file is converted into seqfile and then to sparse vectors.. any kind of text file can be converted into seq file using ./mahout seqdirectory ? thanks in advance.. ./rahul

[jira] [Updated] (MAHOUT-884) Matrix Concatenate utility

2011-12-28 Thread Lance Norskog (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-884: - Fix Version/s: 0.7 > Matrix Concatenate utility > -- > >

[jira] [Issue Comment Edited] (MAHOUT-884) Matrix Concatenate utility

2011-12-28 Thread Lance Norskog (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176527#comment-13176527 ] Lance Norskog edited comment on MAHOUT-884 at 12/28/11 9:01 AM:

[jira] [Updated] (MAHOUT-884) Matrix Concatenate utility

2011-12-28 Thread Lance Norskog (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-884: - Attachment: MAHOUT-884.patch Completely redone. Now a Hadoop job which uses Jake's trick of cachi