[jira] [Commented] (MAHOUT-676) Random samplers in a modular library

2011-04-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021427#comment-13021427 ] Sean Owen commented on MAHOUT-676: -- I agree that this looks like it duplicates the existi

[jira] [Commented] (MAHOUT-675) LuceneIterator throws an IllegalStateException when a null TermFreqVector is encountered for a document instead of skipping to the next one

2011-04-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021425#comment-13021425 ] Sean Owen commented on MAHOUT-675: -- Patches should always be against head but I can figur

[jira] [Issue Comment Edited] (MAHOUT-676) Random samplers in a modular library

2011-04-18 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021420#comment-13021420 ] Lance Norskog edited comment on MAHOUT-676 at 4/19/11 6:31 AM: -

[jira] [Commented] (MAHOUT-676) Random samplers in a modular library

2011-04-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021421#comment-13021421 ] Ted Dunning commented on MAHOUT-676: What is the application of these? How do they in

[jira] [Updated] (MAHOUT-676) Random samplers in a modular library

2011-04-18 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-676: - Attachment: Sampler.patch This is an expository cut at a Sampler library. Intended for review, no

[jira] [Created] (MAHOUT-676) Random samplers in a modular library

2011-04-18 Thread Lance Norskog (JIRA)
Random samplers in a modular library Key: MAHOUT-676 URL: https://issues.apache.org/jira/browse/MAHOUT-676 Project: Mahout Issue Type: New Feature Components: Math Reporter: Lance No

Re: LuceneIterator throws an IllegalStateException when a null term frequency vector is encountered

2011-04-18 Thread Ted Dunning
I disagree. You should document that you are discarding documents. It is reasonable to not document every lost document and good to throw an exception when too many failures occur. It is almost inevitable with large data that some inputs are malformed. These can't stop the show, but you have to

Re: LuceneIterator throws an IllegalStateException when a null term frequency vector is encountered

2011-04-18 Thread Ted Dunning
Yeah... that sounds right. On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan wrote: > It appears that the previous patch has already been applied. Should I > repull the repo, make a new ticket, and create a new patch? >

Re: LuceneIterator throws an IllegalStateException when a null term frequency vector is encountered

2011-04-18 Thread Lance Norskog
Please don't log it. Nobody reads logs. Right is right and wrong is wrong. Either throw an exception or ignore it. You can include a ratio of accepted vectors as an output. On Mon, Apr 18, 2011 at 5:52 PM, Christopher Jordan wrote: > I have incorporated this requested change in a new patch that I

Re: LuceneIterator throws an IllegalStateException when a null term frequency vector is encountered

2011-04-18 Thread Christopher Jordan
I have incorporated this requested change in a new patch that I attached to ticket https://issues.apache.org/jira/browse/MAHOUT-675. It appears that the previous patch has already been applied. Should I repull the repo, make a new ticket, and create a new patch? Thanks, Chris On Apr 18, 2011,

[jira] [Updated] (MAHOUT-675) LuceneIterator throws an IllegalStateException when a null TermFreqVector is encountered for a document instead of skipping to the next one

2011-04-18 Thread Chris Jordan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jordan updated MAHOUT-675: Attachment: MAHOUT-675-1 I have attached a minor addition to my patch. It was brought up by Ted Dun

Build failed in Jenkins: Mahout-Quality #759

2011-04-18 Thread Apache Hudson Server
See -- [...truncated 5468 lines...] Running org.apache.mahout.vectorizer.encoders.ContinuousValueEncoderTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.01 sec Running org.apach

Build failed in Jenkins: Mahout-Quality #758

2011-04-18 Thread Apache Hudson Server
See Changes: [srowen] MAHOUT-675 just warn about docs with no term freq vector [srowen] Maybe fix issue with accidentally reading _SUCCESS files from Cloudera distro [srowen] MAHOUT-674 fix NPE by using right map key --

[jira] [Commented] (MAHOUT-671) Refactor org.apache.mahout.utils.vectors.lucene.Driver into a POJO

2011-04-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021304#comment-13021304 ] Ted Dunning commented on MAHOUT-671: Having the drivers as POJO's is a good thing in g

[jira] [Commented] (MAHOUT-671) Refactor org.apache.mahout.utils.vectors.lucene.Driver into a POJO

2011-04-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021300#comment-13021300 ] Sean Owen commented on MAHOUT-671: -- This looks broadly reasonable to me. Any objections?

[jira] [Updated] (MAHOUT-675) LuceneIterator throws an IllegalStateException when a null TermFreqVector is encountered for a document instead of skipping to the next one

2011-04-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-675: - Resolution: Fixed Fix Version/s: 0.5 Status: Resolved (was: Patch Available) I looked

[jira] [Resolved] (MAHOUT-674) Let the TrainLogistic class could print the category feature weight probably

2011-04-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-674. -- Resolution: Fixed Assignee: Sean Owen Looks like a pretty clear oversight -- the value is definit

Re: LuceneIterator throws an IllegalStateException when a null term frequency vector is encountered

2011-04-18 Thread Christopher Jordan
Sounds like a good idea with the default set to 100%. I'll make that update. On Apr 18, 2011, at 1:54 PM, Ted Dunning wrote: That sounds right to me. It might be plausible to blow an exception if a (configurable) large percentage of all documents have to be rejected. That is a minor improveme

[jira] [Commented] (MAHOUT-672) Implementation of Conjugate Gradient for solving large linear systems

2011-04-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021168#comment-13021168 ] Ted Dunning commented on MAHOUT-672: See https://github.com/tdunning/LatentFactorLogLi

[jira] [Commented] (MAHOUT-672) Implementation of Conjugate Gradient for solving large linear systems

2011-04-18 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021159#comment-13021159 ] Jonathan Traupman commented on MAHOUT-672: -- Also, can you point me to the specifi

[jira] [Commented] (MAHOUT-672) Implementation of Conjugate Gradient for solving large linear systems

2011-04-18 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021158#comment-13021158 ] Jonathan Traupman commented on MAHOUT-672: -- OK, yeah, I think I misunderstood whi

Re: LuceneIterator throws an IllegalStateException when a null term frequency vector is encountered

2011-04-18 Thread Ted Dunning
That sounds right to me. It might be plausible to blow an exception if a (configurable) large percentage of all documents have to be rejected. That is a minor improvement, though. On Mon, Apr 18, 2011 at 10:52 AM, Christopher Jordan wrote: > I believe, at least in my situation, a better approac

LuceneIterator throws an IllegalStateException when a null term frequency vector is encountered

2011-04-18 Thread Christopher Jordan
Hi, I opened a rather detailed JIRA ticket and submitted patch regarding this issue already: https://issues.apache.org/jira/browse/MAHOUT-675 The short of it is that the LuceneIterator throws an IllegalStateException when a null term vector is encountered in the computeNext method. That is pro

[jira] [Commented] (MAHOUT-672) Implementation of Conjugate Gradient for solving large linear systems

2011-04-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021145#comment-13021145 ] Ted Dunning commented on MAHOUT-672: {quote} As for a linear regression implementation

Re: springified org.apache.mahout.utils.vectors.lucene.Driver (MAHOUT-671)

2011-04-18 Thread Ted Dunning
Sounds like you are on your way. And we are interested in all patches. We don't commit all of them, but we are very interested! On Mon, Apr 18, 2011 at 10:38 AM, Christopher Jordan wrote: > As this is my first time actually contributing to this project, I am not > sure what the code submission

[jira] [Commented] (MAHOUT-672) Implementation of Conjugate Gradient for solving large linear systems

2011-04-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021144#comment-13021144 ] Ted Dunning commented on MAHOUT-672: Jonathan, This all sounds good. There is a poi

[jira] [Commented] (MAHOUT-672) Implementation of Conjugate Gradient for solving large linear systems

2011-04-18 Thread Jonathan Traupman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021138#comment-13021138 ] Jonathan Traupman commented on MAHOUT-672: -- Ted- I'd have to dig a little deeper

springified org.apache.mahout.utils.vectors.lucene.Driver (MAHOUT-671)

2011-04-18 Thread Christopher Jordan
Hi, For one of my projects at work, I recently had need of the lucene vector dumper. I used the current Driver class as a template however, I thought it would be good to have a more POJO like (springified) version. That makes the lucene vector dumper more accessible to frameworks like Spring; i

[jira] [Updated] (MAHOUT-675) LuceneIterator throws an IllegalStateException when a null TermFreqVector is encountered for a document instead of skipping to the next one

2011-04-18 Thread Chris Jordan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jordan updated MAHOUT-675: Status: Patch Available (was: Open) > LuceneIterator throws an IllegalStateException when a null T

[jira] [Updated] (MAHOUT-675) LuceneIterator throws an IllegalStateException when a null TermFreqVector is encountered for a document instead of skipping to the next one

2011-04-18 Thread Chris Jordan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jordan updated MAHOUT-675: Attachment: MAHOUT-675 I have patch the LuceneIterator. It is a mild change that also adds a Logger

[jira] [Created] (MAHOUT-675) LuceneIterator throws an IllegalStateException when a null TermFreqVector is encountered for a document instead of skipping to the next one

2011-04-18 Thread Chris Jordan (JIRA)
LuceneIterator throws an IllegalStateException when a null TermFreqVector is encountered for a document instead of skipping to the next one ---

[jira] [Updated] (MAHOUT-671) Refactor org.apache.mahout.utils.vectors.lucene.Driver into a POJO

2011-04-18 Thread Chris Jordan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jordan updated MAHOUT-671: Status: Patch Available (was: Open) > Refactor org.apache.mahout.utils.vectors.lucene.Driver into

[jira] [Commented] (MAHOUT-399) LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.

2011-04-18 Thread Michael Lazarus (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021128#comment-13021128 ] Michael Lazarus commented on MAHOUT-399: Hi Grant, Yes, there should be 9 words

[jira] [Updated] (MAHOUT-671) Refactor org.apache.mahout.utils.vectors.lucene.Driver into a POJO

2011-04-18 Thread Chris Jordan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Jordan updated MAHOUT-671: Attachment: MAHOUT-671 I have refactored org.apache.mahout.utils.vectors.lucene.Driver to be a POJO

[jira] [Commented] (MAHOUT-399) LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.

2011-04-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021001#comment-13021001 ] Grant Ingersoll commented on MAHOUT-399: Finally got output, but not the same as M

[jira] [Updated] (MAHOUT-674) Let the TrainLogistic class could print the category feature weight probably

2011-04-18 Thread Stanley Xu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanley Xu updated MAHOUT-674: -- Attachment: trainlogsiticexample.diff A one line patch should fix this issue and help the users could t

[jira] [Created] (MAHOUT-674) Let the TrainLogistic class could print the category feature weight probably

2011-04-18 Thread Stanley Xu (JIRA)
Let the TrainLogistic class could print the category feature weight probably Key: MAHOUT-674 URL: https://issues.apache.org/jira/browse/MAHOUT-674 Project: Mahout I