[jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2012-02-12 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206679#comment-13206679 ] Lance Norskog commented on MAHOUT-884: -- You're right, the Metadata call should be rem

Re: CIMapper Question

2012-02-12 Thread Jeff Eastman
PolymorphicWritable actually works great in the two applications of it I committed today. They are low-volume of course so the overhead of writing the class name is not onerous. On 2/12/12 9:57 PM, Lance Norskog wrote: Another option is TupleWritable. But pull the source and make sure it works

Re: [jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2012-02-12 Thread Lance Norskog
You're right, the Metadata call should be removed. Mahout does not use the Metadata feature anywhere. On Sun, Feb 12, 2012 at 11:01 AM, Suneel Marthi (Commented) (JIRA) wrote: > >    [ > https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment

Re: Build failed in Jenkins: Mahout-Quality #1348

2012-02-12 Thread Jeff Eastman
Honestly, I wonder sometimes why we even bother with Jenkins? In a past life, Hudson was pretty reliable but this kind of unreliability cuts to the core of its usefulness. If I immediately discount every report I see, I say let's just turn it off. On 2/12/12 8:09 PM, Apache Jenkins Server wrot

[jira] [Issue Comment Edited] (MAHOUT-944) LuceneIndexToSequenceFiles (lucene2seq) utility

2012-02-12 Thread Lance Norskog (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206670#comment-13206670 ] Lance Norskog edited comment on MAHOUT-944 at 2/13/12 5:04 AM: -

Re: CIMapper Question

2012-02-12 Thread Lance Norskog
Another option is TupleWritable. But pull the source and make sure it works, I had problems. On Sun, Feb 12, 2012 at 9:22 AM, Jeff Eastman wrote: > This approach worked out, not exactly as below, but I was able to create a > ClusterWritable which used PolymorphicWritable to read and write its Clu

Re: Mandatory svnpubsub migration by Jan 2013

2012-02-12 Thread Lance Norskog
I thought we were saying that only committers could edit the wiki? If it is only the main page(s) that are committers-only, that is fine. But if the entire doc site for Mahout is committers-only, then we need a low-effort workflow for submitting page changes. On Sun, Feb 12, 2012 at 6:58 AM, Ted D

[jira] [Commented] (MAHOUT-944) LuceneIndexToSequenceFiles (lucene2seq) utility

2012-02-12 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206670#comment-13206670 ] Lance Norskog commented on MAHOUT-944: -- This is a Lucene query. It's already sorted!

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2012-02-12 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206660#comment-13206660 ] Lance Norskog commented on MAHOUT-975: -- The newest patch does not compile against the

Re: [jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Ted Dunning
Christian, All of what you say makes reasonable sense, but I think that you put too much weight on the current uses of the API which are warped by the initial logistic regression implementation. The heart is classifyFull. It returns scores which by convention are large for the 1-of-n category fo

Build failed in Jenkins: Mahout-Quality #1348

2012-02-12 Thread Apache Jenkins Server
See -- [...truncated 7 lines...] at hudson.model.AbstractProject.checkout(AbstractProject.java:1195) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:576) a

[jira] [Commented] (MAHOUT-884) Matrix Concatenate utility

2012-02-12 Thread Suneel Marthi (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206486#comment-13206486 ] Suneel Marthi commented on MAHOUT-884: -- Has this code been committed to trunk yet. Lo

Re: CIMapper Question

2012-02-12 Thread Jeff Eastman
This approach worked out, not exactly as below, but I was able to create a ClusterWritable which used PolymorphicWritable to read and write its Cluster value field. This makes it through the mapper and reducer but I'm still working on getting it all to fly in the ClusterIterator. On 2/12/12 9:

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Herta updated MAHOUT-976: --- Description: Implement a multi layer perceptron * via Matrix Multiplication * Learning by

Re: CIMapper Question

2012-02-12 Thread Raphael Cendrillon
Hi Jeff, It's great to see some discussion on this. I ran into a similar problem when trying to make the SplitInput job work for any arbitrary key and value classes. In the end I was able to side step the issue by just reading the key and value classes from the SequenceFileInput, but I never fo

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206446#comment-13206446 ] Christian Herta edited comment on MAHOUT-976 at 2/12/12 4:37 PM: ---

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Herta updated MAHOUT-976: --- Attachment: MAHOUT-976.patch uncomplete and completely untested should only compile

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Herta updated MAHOUT-976: --- Comment: was deleted (was: uncomplete and completly untested should only compile ) >

Re: CIMapper Question

2012-02-12 Thread Jeff Eastman
Thanks Sean & Ted. That is what I've observed experimentally. I was going to pursue a ClusterWriteable along the lines of VectorWritable but will try PolymorphicWritable first. Looking at it, I see it does send the class name which might be onerous as Sean observed except for the fact that I am

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Herta updated MAHOUT-976: --- Status: Patch Available (was: Open) uncomplete and completly untested should only compile

Re: CIMapper Question

2012-02-12 Thread Sean Owen
Exactly right, and that's exactly the answer in some form. PolymorphicWritable isn't suitable if you're writing a lot of records as the overhead of writing a 40-byte string is too much at scale. On Sun, Feb 12, 2012 at 4:01 PM, Ted Dunning wrote: > But this sounds like a runtime problem, not a ty

Re: [jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Herta, Christian
Hello Ted,   thanks for the fast reply. Maybe I expressed myself not clearly. In the first case (n mutually exclusive classes) classify and the current implementation ofclassifyFullin AbstractVectorClassfier make sense. The implementation use the assumption sum_i p_i = 1. Here the assumption is val

Re: CIMapper Question

2012-02-12 Thread Ted Dunning
But this sounds like a runtime problem, not a type checking problem. Polymorphism is generally a problem in the Hadoop API. That is why we have VectorWritable and why I added PolymorphicWritable. Jeff, Two questions: 1) would PolymorphicWritable help? 2) can you say more about what the IOExc

Re: CIMapper Question

2012-02-12 Thread Sean Owen
The problem really arises when you have to tell the Job what the class of the Mapper key/value is. It needs something concrete. The issue is not here in the Mapper declaration. The general answer is, no, it has to somehow know what it's reading before it reads it. You can accomplish this by, say,

Re: CIMapper Question

2012-02-12 Thread Paritosh Ranjan
Can something like this help? public class CIMapper extends Mapper,VectorWritable,IntWritable,T> { ... } On 12-02-2012 06:48, Jeff Eastman wrote: I'm wondering how to tease the elephant into accepting any concrete instance of the interface o.a.m.clustering.Cluster when writing trained cluste

Re: Mandatory svnpubsub migration by Jan 2013

2012-02-12 Thread Ted Dunning
Comments on JIRA's are public discussions that should stay in place. It is fine to add a summary, but the discussion should remain. On Sun, Feb 12, 2012 at 1:22 AM, Sean Owen wrote: > If you mean, can you post JIRAs with diffs to the docs, surely. It is > all in SVN now. I'm not sure what you

Re: [jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Ted Dunning
On Sun, Feb 12, 2012 at 5:14 AM, Christian Herta (Commented) (JIRA) < j...@apache.org> wrote: > > The implementation of public Vector classifyFull(Vector r, Vector > instance) in AbstractVectorClassifier assumes that the probabilities of > the n elements of the output vector sum to 1. This i

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2012-02-12 Thread Dan Brickley (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206416#comment-13206416 ] Dan Brickley commented on MAHOUT-524: - Shannon informs me I'm getting this error becau

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206407#comment-13206407 ] Christian Herta edited comment on MAHOUT-976 at 2/12/12 1:15 PM: ---

[jira] [Issue Comment Edited] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206407#comment-13206407 ] Christian Herta edited comment on MAHOUT-976 at 2/12/12 1:14 PM: ---

[jira] [Commented] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206407#comment-13206407 ] Christian Herta commented on MAHOUT-976: The implementation of public Vector class

[jira] [Updated] (MAHOUT-976) Implement Multilayer Perceptron

2012-02-12 Thread Christian Herta (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Herta updated MAHOUT-976: --- Description: Implement a multi layer perceptron * via Matrix Multiplication * Learning by

[jira] [Commented] (MAHOUT-944) LuceneIndexToSequenceFiles (lucene2seq) utility

2012-02-12 Thread Frank Scholten (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206385#comment-13206385 ] Frank Scholten commented on MAHOUT-944: --- 1. Ok. This involves using the FileSystemDi

Re: Mandatory svnpubsub migration by Jan 2013

2012-02-12 Thread Sean Owen
If you mean, can you post JIRAs with diffs to the docs, surely. It is all in SVN now. I'm not sure what you mean about public comments. On Sun, Feb 12, 2012 at 1:59 AM, Lance Norskog wrote: > Can there be a reviewed patch submissions? Or enable public comments >  which you then digest and remove

[jira] [Commented] (MAHOUT-784) Exception at 20 Newsgroups examples

2012-02-12 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206377#comment-13206377 ] Sean Owen commented on MAHOUT-784: -- PS guys I had already committed the patch. I think th

Re: Goals for Mahout 0.7

2012-02-12 Thread Jeff Eastman
We have a couple JIRAs that relate here: We want to factor all the (-cl) classification steps out of all of the driver classes (MAHOUT-930) and into a separate job to remove duplicated code; MAHOUT-931 is to add a pluggable outlier removal capability to this job; and MAHOUT-933 is aimed at fact

Re: Goals for Mahout 0.7

2012-02-12 Thread Jeff Eastman
+ users@ These are great ideas, and are just the kinds of high level conversations I was hoping to engender. From my agile background, I'd hope to define 0.7 by a small number of "epic stories", in a subset of our overall capabilities, which could focus our attention to a set of derivative JI