[jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-06 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751842#action_12751842 ] Deneche A. Hakim commented on MAHOUT-145: - bq.* TODO: test the code on a Hadoo

[jira] Created: (MAHOUT-173) Implement clustering of massive-domain attributes

2009-09-06 Thread JIRA
Implement clustering of massive-domain attributes - Key: MAHOUT-173 URL: https://issues.apache.org/jira/browse/MAHOUT-173 Project: Mahout Issue Type: New Feature Components: Clusterin

[jira] Updated: (MAHOUT-173) Implement clustering of massive-domain attributes

2009-09-06 Thread JIRA
[ https://issues.apache.org/jira/browse/MAHOUT-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matias Bjørling updated MAHOUT-173: --- Remaining Estimate: 30h (was: 2016h) Original Estimate: 30h (was: 2016h) Changing esti

[jira] Issue Comment Edited: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-06 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751842#action_12751842 ] Deneche A. Hakim edited comment on MAHOUT-145 at 9/6/09 2:52 AM:

More code style; common packages?

2009-09-06 Thread Sean Owen
I'd like to ask we take a moment to agree on and then implement some small code hygiene... should be things we always do to adhere to project and industry norms: - Make sure a copyright statement appears in each file - Let's not do * imports - No serialVersionUID - No printStackTrace() or System.{

Re: What's the plan for Mahout?

2009-09-06 Thread Isabel Drost
On Saturday 05 September 2009 17:30:14 Grant Ingersoll wrote: > we are a machine learning project with a commercial > friendly license and a solid community aiming to build fast, production > ready libraries. +1 I think that summarizes pretty well what I see in Mahout as well. > Java, Hadoop an

Re: What's the plan for Mahout?

2009-09-06 Thread Sean Owen
Practically speaking, to guide short-term goals, we do need to start with a narrower, coherent remit and expand later. Starting as a Java-based, Hadoop-based library for developers, focusing on collaborative filtering, clustering, categorization, and a few other things sounds just right. It would

[jira] Commented: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-09-06 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751923#action_12751923 ] Ted Dunning commented on MAHOUT-157: Hmm... You can get a significant improvement on

Re: [jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-06 Thread Ted Dunning
That fix has been created. Can you just use trunk? What about using the Yahoo 0.20 distribution? ( http://developer.yahoo.com/hadoop/distribution/ ) On Sun, Sep 6, 2009 at 1:01 AM, Deneche A. Hakim (JIRA) wrote: > > bq.* TODO: test the code on a Hadoop 0.20.0 cluster (EC2) > > Looks like

Re: [jira] Issue Comment Edited: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-06 Thread Ted Dunning
Don't bother to create an AMI. Use one of the alestic.com AMI's and just write a boot script. Saves many hours of time. Creating an AMI is a pain for anybody. On Sun, Sep 6, 2009 at 2:53 AM, Deneche A. Hakim (JIRA) wrote: > Looks like I'll have to wait till Hadoop 0.20.1 to be able to test on

Re: What's the plan for Mahout?

2009-09-06 Thread Ted Dunning
I see this as a critical issue. On Sun, Sep 6, 2009 at 8:31 AM, Isabel Drost wrote: > > > but those systems always involve quite a bit of engineering to connect > the > > data fire-hoses into the right spigots. > > I wonder whether there is any way we can make that easier for users? We > certain

Re: [jira] Commented: (MAHOUT-145) PartialData mapreduce Random Forests

2009-09-06 Thread deneche abdelhakim
I'll try...may take some time but I 'll surely learn a lot (will also need a refill on my pain killers) --- En date de : Dim 6.9.09, Ted Dunning a écrit : > De: Ted Dunning > Objet: Re: [jira] Commented: (MAHOUT-145) PartialData mapreduce Random > Forests > À: mahout-dev@lucene.apache.org >

Re: More code style; common packages?

2009-09-06 Thread Grant Ingersoll
On Sep 6, 2009, at 7:16 AM, Sean Owen wrote: I'd like to ask we take a moment to agree on and then implement some small code hygiene... should be things we always do to adhere to project and industry norms: - Make sure a copyright statement appears in each file RAT should help with this. -

Re: More code style; common packages?

2009-09-06 Thread Benson Margulies
I'm sorry that I haven't followed through on the automation here; I have so far failed to find a way to reconcile the existing preferred style to checkstyle as we discussed. On Sun, Sep 6, 2009 at 6:35 PM, Grant Ingersoll wrote: > > On Sep 6, 2009, at 7:16 AM, Sean Owen wrote: > > I'd like to a

[jira] Commented: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-09-06 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751990#action_12751990 ] Robin Anil commented on MAHOUT-157: --- What I am merging are not integers its a pair of a l

Re: More code style; common packages?

2009-09-06 Thread Robin Anil
+1 to all that. Adding to that The FastMap, BitVector and other classes in taste.common are being used(or should be used) by other packages. We can start our own collections package say ...mahout.collections ? about Cache: the reason why both implementation differ is. At one place the datastore/

Google Collections

2009-09-06 Thread Robin Anil
http://code.google.com/p/google-collections/ http://google-collections.googlecode.com/files/collections-connection.pdf http://google-collections.googlecode.com/files/google-collections-svgtug-2008-08-06.pdf Its Apache 2 License. and immutable Lists/Maps claim to take 2-3x less space

Re: Google Collections

2009-09-06 Thread Sean Owen
It's good stuff and I prefer the Google classes to Apache Commons, usually. For my bits of the code I never needed exactly what either were providing -- it's more like good syntactic sugar for common but verbose constructs, and some extensions like Multimaps, rather than replacements or optimized v

Re: Google Collections

2009-09-06 Thread Ted Dunning
Nice. On Sun, Sep 6, 2009 at 9:21 PM, Robin Anil wrote: > http://code.google.com/p/google-collections/ > http://google-collections.googlecode.com/files/collections-connection.pdf > > http://google-collections.googlecode.com/files/google-collections-svgtug-2008-08-06.pdf > > > Its Apache 2 Licens

Re: Google Collections

2009-09-06 Thread Lukáš Vlček
Hi, I think google collections is already used in solr contrib (see contrib/clustering/lib in solr trunk). Might be useful to get some experience from solr developer about this. Regards, Lukas On Mon, Sep 7, 2009 at 8:30 AM, Ted Dunning wrote: > Nice. > > On Sun, Sep 6, 2009 at 9:21 PM, Robin

Re: More code style; common packages?

2009-09-06 Thread Sean Owen
On Mon, Sep 7, 2009 at 2:15 AM, Robin Anil wrote: > The FastMap, BitVector and other classes in taste.common are being used(or > should be used)  by other packages. We can start our own collections package > say ...mahout.collections ? They could be used, but could need some work. Those 'Map' and