[jira] Updated: (MAHOUT-160) ClusterDumper utility to output all the clusters in all sequence files and points

2009-08-05 Thread Shashikant Kore (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Kore updated MAHOUT-160: --- Attachment: mahout-160.patch ClusterDumper utility has been modified to take the clusters an

[jira] Created: (MAHOUT-160) ClusterDumper utility to output all the clusters in all sequence files and points

2009-08-05 Thread Shashikant Kore (JIRA)
ClusterDumper utility to output all the clusters in all sequence files and points - Key: MAHOUT-160 URL: https://issues.apache.org/jira/browse/MAHOUT-160 Project: Mahout

[jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

2009-08-05 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739910#action_12739910 ] Deneche A. Hakim commented on MAHOUT-145: - bq. What really bugs me is that it is wo

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-08-05 Thread JIRA
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739835#action_12739835 ] Nicolás Fantone commented on MAHOUT-121: I'm terribly sorry. It was really busy the

[jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

2009-08-05 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739775#action_12739775 ] Ted Dunning commented on MAHOUT-145: Ouch! || Num Map Tasks || Num trees || In-Mem bui

[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-08-05 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-157: -- Attachment: (was: MAHOUT-157-August-6.patch) > Frequent Pattern Mining using Parallel FP-Growth > -

[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-08-05 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-157: -- Attachment: MAHOUT-157-August-6.patch > Frequent Pattern Mining using Parallel FP-Growth >

[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-08-05 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-157: -- Attachment: MAHOUT-157-Combinations-BSD-License.patch MAHOUT-157-August-6.patch * Fixed

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-08-05 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739712#action_12739712 ] Grant Ingersoll commented on MAHOUT-121: Nicolas, any word on your update? Otherwi

Re: Min/MaxHeap Implementation

2009-08-05 Thread Grant Ingersoll
+1, as we already have Lucene library included. On Aug 5, 2009, at 1:22 PM, Paul Elschot wrote: Op Wednesday 05 August 2009 18:40:41 schreef Robin Anil: I have say a million objects which needs to be inserted very fast and only top K needs to be kept based on a comparator Have a look at th

Re: Min/MaxHeap Implementation

2009-08-05 Thread Paul Elschot
Op Wednesday 05 August 2009 18:40:41 schreef Robin Anil: > I have say a million objects which needs to be inserted very fast and only > top K needs to be kept based on a comparator Have a look at the PriorityQueue here: http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/util

Re: Min/MaxHeap Implementation

2009-08-05 Thread Robin Anil
there is org.apache.hadoop.util.PriorityQueue. Which i think will do the job On Wed, Aug 5, 2009 at 10:10 PM, Robin Anil wrote: > I have say a million objects which needs to be inserted very fast and only > top K needs to be kept based on a comparator > > > > > On Wed, Aug 5, 2009 at 10:07 PM,

Re: Min/MaxHeap Implementation

2009-08-05 Thread Robin Anil
I have say a million objects which needs to be inserted very fast and only top K needs to be kept based on a comparator On Wed, Aug 5, 2009 at 10:07 PM, Sean Owen wrote: > Does java.util.PriorityQueue meet your needs? > > On Wed, Aug 5, 2009 at 5:33 PM, Robin Anil wrote: > > Is there any Min/

Re: Min/MaxHeap Implementation

2009-08-05 Thread Sean Owen
Does java.util.PriorityQueue meet your needs? On Wed, Aug 5, 2009 at 5:33 PM, Robin Anil wrote: > Is there any  Min/MaxHeap implementation (non GPL) to include as part of > Mahout >

Min/MaxHeap Implementation

2009-08-05 Thread Robin Anil
Is there any Min/MaxHeap implementation (non GPL) to include as part of Mahout

[jira] Updated: (MAHOUT-158) Replace all ID values with long

2009-08-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-158: - Attachment: MAHOUT-158.patch Preliminary patch for review for anyone that is curious. Also epic -- core

[jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

2009-08-05 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739584#action_12739584 ] Deneche A. Hakim commented on MAHOUT-145: - more tests on my laptop: KDD 10% || Num

Re: Inconsistency in the equals and hashCode implementations for Vectors

2009-08-05 Thread Mark Desnoyer
Yep, that makes sense, unfortunately, that's not what the code was doing. equivalent() was doing that, but equals() in DenseVector also considered the name property so that sparse.equals(dense) != dense.equals(sparse). I also changed the hashCode to hash off of the entries in the container instead

Re: Inconsistency in the equals and hashCode implementations for Vectors

2009-08-05 Thread Grant Ingersoll
Thanks, Mark. I will look into it. I know at the time I wrote them, I made an explicit departure from it, but I need to revisit why I did. I seem to recall it having to do with the arrays and with the fact that we want equals() to work like List.equals(), namely that a DenseVector and a

[jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

2009-08-05 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739386#action_12739386 ] Deneche A. Hakim commented on MAHOUT-145: - I'm running some tests to compare betwee