Re: eclipse codestyle.xml?

2009-11-24 Thread Drew Farris
Great -- this works now. Thanks! On Tue, Nov 24, 2009 at 10:20 AM, Grant Ingersoll wrote: > Actually, the Mahout wiki links are out of date.  I'll update. >

[jira] Updated: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-206: --- Attachment: MAHOUT-206.patch This adds back SparseVector now as the parent of both sparse impls, and

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Sean Owen
True, and that advantage rarely comes up. However declaring abstract methods in an abstract class has exactly the same problem that adding interface methods does -- and I take it this is the heart of the problem -- all implementors get broken immediately. If you intend to add methods this way, nei

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Yonik Seeley
On Tue, Nov 24, 2009 at 3:30 PM, Sean Owen wrote: > I'm willing to be convinced but what is the theoretical argument for this? Rather the opposite - it's a practical argument gained through experience. > I am all for interfaces *and* abstract classes. You write the API in > terms of interfaces f

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Grant Ingersoll
On Nov 24, 2009, at 3:30 PM, Sean Owen wrote: > I'm willing to be convinced but what is the theoretical argument for this? See the Lucene archives: http://search.lucidimagination.com. There has been a lot of discussion on it. And I mean a lot. And then some. :-) Search for anything on in

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Grant Ingersoll
Yes, I have lived this pain for a long time with Lucene. Personally, though, a lot of the pain comes from a fairly strict back compatibility policy that to me isn't always well founded given the release cycle Lucene usually operates under. I've always wished there was a @introducing annotation

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Sean Owen
I'm willing to be convinced but what is the theoretical argument for this? I am all for interfaces *and* abstract classes. You write the API in terms of interfaces for maximum flexibility. You provide abstract partial implementations for convenience. Everyone is happy. The best argument I've seen

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Jake Mannix
Oof. So you're arguing this as a temporary thing, until our interfaces stabilize? It makes unit testing much harder this way, but I guess I see the rationale. If we do this, we need to leave a lot out of that base class - there may be some really big differences in implementation of these classes

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Ted Dunning
Yes. Interfaces are the problem that commons math have boxed themselves in with. The Hadoop crew (especially Doug C) are adamant about using as few interfaces as possible except as mixin signals and only in cases where the interface really is going to be very, very stable. Our vector interfaces

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Jake Mannix
Well we do use AbstractVector. Are you suggesting that we *not* have a Vector interface at all, and *only* have an abstract base class? Similarly for Matrix? -jake On Tue, Nov 24, 2009 at 11:57 AM, Ted Dunning wrote: > We should use abstract classes almost everywhere instead of interfaces t

Re: [jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Ted Dunning
We should use abstract classes almost everywhere instead of interfaces to ease backward compatibility issues with user written extensions to Vectors and Matrices. On Tue, Nov 24, 2009 at 9:38 AM, Grant Ingersoll (JIRA) wrote: > It seems like there is still some commonality between the two > imple

[jira] Created: (MAHOUT-209) Add aggregate() methods for Vector

2009-11-24 Thread Jake Mannix (JIRA)
Add aggregate() methods for Vector -- Key: MAHOUT-209 URL: https://issues.apache.org/jira/browse/MAHOUT-209 Project: Mahout Issue Type: Improvement Components: Matrix Environment: all

[jira] Created: (MAHOUT-208) Vector.getLengthSquared() is dangerously optimized

2009-11-24 Thread Jake Mannix (JIRA)
Vector.getLengthSquared() is dangerously optimized -- Key: MAHOUT-208 URL: https://issues.apache.org/jira/browse/MAHOUT-208 Project: Mahout Issue Type: Bug Components: Matrix Affe

[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782080#action_12782080 ] Grant Ingersoll commented on MAHOUT-207: All makes sense. Per the refactoring in M

[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782077#action_12782077 ] Jake Mannix commented on MAHOUT-207: We definitely should include the optimization that

[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782064#action_12782064 ] Grant Ingersoll commented on MAHOUT-207: Aren't we loosing some of the benefits of

[jira] Commented: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782054#action_12782054 ] Grant Ingersoll commented on MAHOUT-206: Jake, there's something weird in this patc

[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782049#action_12782049 ] Jake Mannix commented on MAHOUT-207: It looks like the work done on MAHOUT-159 did not

[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782051#action_12782051 ] Ted Dunning commented on MAHOUT-207: I think that 159 is superseded by this work. > A

[jira] Assigned: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-206: -- Assignee: Grant Ingersoll > Separate and clearly label different SparseVector implement

[jira] Assigned: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-207: -- Assignee: Grant Ingersoll > AbstractVector.hashCode() should not care about the order o

[jira] Commented: (MAHOUT-207) AbstractVector.hashCode() should not care about the order of iteration over elements

2009-11-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782041#action_12782041 ] Grant Ingersoll commented on MAHOUT-207: How does this all relate to https://issues

[jira] Resolved: (MAHOUT-201) OrderedIntDoubleMapping / SparseVector is unnecessarily slow

2009-11-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-201. Resolution: Duplicate The patch for this is currently included in the patch for MAHOUT-206 (which

Re: Moving ahead to Hadoop 0.22

2009-11-24 Thread Ted Dunning
Well, it is wrong at some level and it will become more and more wrong. I have heard from Chris Wenzel that the cost of moving post 19 was pretty high. It would be good to do that when we can do it whole-heartedly. (what is the situation with 21?) On Tue, Nov 24, 2009 at 1:23 AM, Sean Owen wro

[jira] Updated: (MAHOUT-206) Separate and clearly label different SparseVector implementations

2009-11-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-206: --- Attachment: MAHOUT-206.patch Patch renames the hash-based SparseVector with RandomAccessSparseVector,

Re: eclipse codestyle.xml?

2009-11-24 Thread Grant Ingersoll
Actually, the Mahout wiki links are out of date. I'll update. On Nov 24, 2009, at 10:19 AM, Grant Ingersoll wrote: > Hmm, weird. They must have gotten lost when the ASF upgraded MoinMoin. They > are the same as Lucene's: http://wiki.apache.org/lucene-java/HowToContribute > > On Nov 24, 2009,

Re: eclipse codestyle.xml?

2009-11-24 Thread Simon Willnauer
We updated the lucene ones during apache con - this should work though! On Tue, Nov 24, 2009 at 4:19 PM, Grant Ingersoll wrote: > Hmm, weird.  They must have gotten lost when the ASF upgraded MoinMoin.  They > are the same as Lucene's: http://wiki.apache.org/lucene-java/HowToContribute > > On N

Re: eclipse codestyle.xml?

2009-11-24 Thread Grant Ingersoll
Hmm, weird. They must have gotten lost when the ASF upgraded MoinMoin. They are the same as Lucene's: http://wiki.apache.org/lucene-java/HowToContribute On Nov 24, 2009, at 10:11 AM, Drew Farris wrote: > Hi All, > > On the wiki, http://cwiki.apache.org/MAHOUT/howtocontribute.html, The > link

eclipse codestyle.xml?

2009-11-24 Thread Drew Farris
Hi All, On the wiki, http://cwiki.apache.org/MAHOUT/howtocontribute.html, The link at the bottom of the page to the eclipse codestyle.xml for Mahout's coding conventions seems to be broken. Does anyone have a codestyle.xml for eclipse available? Thanks, Drew

[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions

2009-11-24 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781896#action_12781896 ] Grant Ingersoll commented on MAHOUT-204: Yeah, go ahead and submit the patch, then

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-11-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781858#action_12781858 ] Sean Owen commented on MAHOUT-103: -- Yes, this is basically item-based recommendation. With

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-11-24 Thread Ankur (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781838#action_12781838 ] Ankur commented on MAHOUT-103: -- For this co-occurrence based recommender I am planning to writ

[jira] Commented: (MAHOUT-204) Better integration of Mahout matrix capabilities with Colt Matrix additions

2009-11-24 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781834#action_12781834 ] Sean Owen commented on MAHOUT-204: -- I am happy to do this, but in order to avoid a massive

Re: Moving ahead to Hadoop 0.22

2009-11-24 Thread Sean Owen
Early report from my testing is it's going to break a lot of our code, so, perhaps a bridge too far now. There's one reason I'm keen to move forward and it's not merely wanting to be on the bleeding edge, far from it. It's that 0.20.x does not work at all for my jobs. It runs into bugs that 0.20.1

Re: Moving ahead to Hadoop 0.22

2009-11-24 Thread Robin Anil
0.22 is supposed to stabilize the new mapreduce package and remove support for the old mapred package. So I am guessing the reason for moving to 0.22 would go side by side with conversion of all our existing mapred programs to mapreduce ones. And I believe I read somewhere that this is the api tha