[jira] Updated: (MAHOUT-184) Code tweaks for .df.* code

2009-10-01 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-184: - Attachment: Tweaks_to__df__.patch > Code tweaks for .df.* code > -- > >

[jira] Created: (MAHOUT-184) Code tweaks for .df.* code

2009-10-01 Thread Sean Owen (JIRA)
Code tweaks for .df.* code -- Key: MAHOUT-184 URL: https://issues.apache.org/jira/browse/MAHOUT-184 Project: Mahout Issue Type: Improvement Reporter: Sean Owen Assignee: Sean Owen P

Re: Style discussion, Oct 09 edition

2009-10-01 Thread Ted Dunning
Yes. It is just expectations. x += 1= or x++ or x-- are all common. Other operations are uncommon and % in particular has the deviant property of other meanings in other languages (it is often a comment character). On Thu, Oct 1, 2009 at 12:21 PM, Sean Owen wrote: > Is it > merely a question

Re: Style discussion, Oct 09 edition

2009-10-01 Thread Sean Owen
It's so small that I can't feel strongly about it either way. Is it merely a question of what one expects to read? I am more accustomed to the latter form. The language allows it and I'm used to seeing it in various forms like "i++". It can only be faster, at runtime, since it might avoid an additi

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Ted Dunning
Colt did a nice job of this. Basically, their idea was to take various general functional patterns and allow the functions to be plugged in. Common patterns that are reasonable to include in such a framework include: a) dot product as the aggregration of a pairwise function application (normal d

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Jake Mannix
On Thu, Oct 1, 2009 at 10:10 AM, Ted Dunning wrote: > Btw... the other think that the HashVector does better is inserts. The > sorted vector could do much better on average if it deferred sorting until > an access or iteration was done. Even iteration doesn't necessarily need > sorting, but it

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Ted Dunning
Btw... the other think that the HashVector does better is inserts. The sorted vector could do much better on average if it deferred sorting until an access or iteration was done. Even iteration doesn't necessarily need sorting, but it could by seen as part of the contract. On Thu, Oct 1, 2009 at

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Jake Mannix
Yeah, I added the "trying to find..." part of the debug output because I couldn't figure out what IntDoubleHash was "impossibly confused" about. Unfortunately, seeing what it was confused about only confused *me* about why it was impossible. On Thu, Oct 1, 2009 at 9:12 AM, Ted Dunning wrote: >

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Ted Dunning
It indicates a bug in the code or in the writer's head. You are correct about the intent. The default value (which should probably just be 0) should be returned if the value is missing. On Thu, Oct 1, 2009 at 5:41 AM, Grant Ingersoll (JIRA) wrote: > I don't think this is "impossible confusion",

Re: Style discussion, Oct 09 edition

2009-10-01 Thread Ted Dunning
On Thu, Oct 1, 2009 at 2:29 AM, Sean Owen wrote: > This doesn't make sense when the args are ints: > Math.floor(numTrees / numMaps) > The result is already an int, rounded 'down' > Since other languages have incoherent rounding rules for this division, this idiom can actually help readability.

Re: [ANNOUNCEMENT] Apache Commons Math 2.0 Released

2009-10-01 Thread Ted Dunning
The MTJ committers were willing to relicense under Apache and donate the entire package. On Wed, Sep 30, 2009 at 9:17 PM, Jake Mannix wrote: > On Wed, Sep 30, 2009 at 8:26 PM, Ted Dunning > wrote: > > > No motion. I was pushing that integration because it looked like MTJ was > > integrating wi

[jira] Assigned: (MAHOUT-182) New helper methods for Matrix: times(Vector), timesSquared(Vector), numRows() and numCols()

2009-10-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-182: -- Assignee: Grant Ingersoll > New helper methods for Matrix: times(Vector), timesSquared(

[jira] Updated: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-165: --- Attachment: mahout-165.patch This gets the VectorTest testEquals to pass. Also fixes an inst

[jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761201#action_12761201 ] Grant Ingersoll commented on MAHOUT-165: The exception in the test is: {quote} java

[jira] Updated: (MAHOUT-183) WikipediaXmlSplitter spits one chunk per line

2009-10-01 Thread Olivier Grisel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Grisel updated MAHOUT-183: -- Description: The Wikipedia XML splitter inner loop erronously detects end of the line-iterator

[jira] Updated: (MAHOUT-183) WikipediaXmlSplitter spits one chunk per line

2009-10-01 Thread Olivier Grisel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Grisel updated MAHOUT-183: -- Status: Patch Available (was: Open) > WikipediaXmlSplitter spits one chunk per line >

[jira] Updated: (MAHOUT-183) WikipediaXmlSplitter spits one chunk per line

2009-10-01 Thread Olivier Grisel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Grisel updated MAHOUT-183: -- Attachment: MAHOUT-183-wikipedia-xml-splitter.patch > WikipediaXmlSplitter spits one chunk per

[jira] Created: (MAHOUT-183) WikipediaXmlSplitter spits one chunk per line

2009-10-01 Thread Olivier Grisel (JIRA)
WikipediaXmlSplitter spits one chunk per line - Key: MAHOUT-183 URL: https://issues.apache.org/jira/browse/MAHOUT-183 Project: Mahout Issue Type: Bug Components: Classification Affect

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Sean Owen
(PS yeah that was my fault for misreading the original message.) On Thu, Oct 1, 2009 at 11:39 AM, Grant Ingersoll wrote: > > On Sep 30, 2009, at 4:34 PM, Jake Mannix wrote: > >> I didn't say that equals() should ignore name, I said the opposite - >> equals >> and >> hashCode() should *only* take

Re: [ANNOUNCEMENT] Apache Commons Math 2.0 Released

2009-10-01 Thread Grant Ingersoll
On Oct 1, 2009, at 12:17 AM, Jake Mannix wrote: On Wed, Sep 30, 2009 at 8:26 PM, Ted Dunning wrote: So why do we really need vectors to be Writable? I see the appeal, it's nice and makes the code nicely integrated, but the way I ended up going, so that you could use decomposer either

Re: [jira] Commented: (MAHOUT-165) Using better primitives hash for sparse vector for performance gains

2009-10-01 Thread Grant Ingersoll
On Sep 30, 2009, at 4:34 PM, Jake Mannix wrote: I didn't say that equals() should ignore name, I said the opposite - equals and hashCode() should *only* take into account the contents and the name, and not implementation (which means that hashCode() needs to stay in one place and not ge

Style discussion, Oct 09 edition

2009-10-01 Thread Sean Owen
May I suggest we don't static import classes unless it makes a real difference in readability, but certainly, may I suggest we don't static-import a single static method or field from a class? finding a reference to it in the code becomes confusing because one expects the method or field exists in

[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-10-01 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-157: -- Attachment: MAHOUT-157-Oct-1.patch Finished Sequential version of FPGrowth. May need some more document