In-progress patch for migration to Hadoop 0.20

2009-07-07 Thread Sean Owen
Hi all, I started converting my stuff to Hadoop 0.20, since a lot was deprecated in the mapreduce portion. For what it's worth, attached is the fruit of a couple hours of research on how to translate into new APIs. Maybe it helps. I have not verified it works. Once I get it working, others can co

[jira] Commented: (MAHOUT-122) Random Forests Reference Implementation

2009-07-07 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728034#action_12728034 ] Deneche A. Hakim commented on MAHOUT-122: - I forgot to mention that I used Kdd50% i

[jira] Updated: (MAHOUT-122) Random Forests Reference Implementation

2009-07-07 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated MAHOUT-122: Attachment: refimp_Jul7.diff I did some tests on the "poker hand" dataset from UCI, it cont

Re: In-progress patch for migration to Hadoop 0.20

2009-07-07 Thread Grant Ingersoll
Are you referring to MAHOUT-143? On Jul 7, 2009, at 7:07 AM, Sean Owen wrote: Hi all, I started converting my stuff to Hadoop 0.20, since a lot was deprecated in the mapreduce portion. For what it's worth, attached is the fruit of a couple hours of research on how to translate into new APIs. M

Re: In-progress patch for migration to Hadoop 0.20

2009-07-07 Thread Sean Owen
Oh I guess I am, now that I looked at the issue list (MAHOUT-142 right?) So far I am just migrating "my" stuff (o.a.m.cf.taste) since I know it and wanted to share some intermediate result. On Tue, Jul 7, 2009 at 12:43 PM, Grant Ingersoll wrote: > Are you referring to MAHOUT-143?

Re: In-progress patch for migration to Hadoop 0.20

2009-07-07 Thread Robin Anil
I had tried to port my code to .mapreduce.* library. There are a lot of helper classes which was developed for mapred.* library which is still not there for the new API. I went as fast as completely modifying driver mapper reducer only to find out the MultipleFileOutputFormat (splitting Reduce outp

About TestClusters of Hadoop

2009-07-07 Thread Robin Anil
Hi,I have gone as far as i can in testing Bayes Code using 20Newsgroups. It would be great if we can test the code over Wikipedia dump. But my laptop is no match for it :). If any test cluster is available for mahout developers, i would certainly like to get my hands on it for some

Re: [jira] Updated: (MAHOUT-123) Implement Latent Dirichlet Allocation

2009-07-07 Thread Jeff Eastman
David Hall (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Hall updated MAHOUT-123: -- Attachment: MAHOUT-123.patch Tests included, wiki page is created I downlo

Re: In-progress patch for migration to Hadoop 0.20

2009-07-07 Thread Ted Dunning
There is a new name for something very similar (which I now forget). On Tue, Jul 7, 2009 at 5:43 AM, Robin Anil wrote: > I had tried to port my code to .mapreduce.* library. There are a lot of > helper classes which was developed for mapred.* library which is still not > there for the new API. I

Re : [GSOC] July 6 is mid-term evaluations

2009-07-07 Thread deneche abdelhakim
The students mid-term survey is available online. I'm posting this because I almost forgot it =P --- En date de : Mer 17.6.09, Grant Ingersoll a écrit : > De: Grant Ingersoll > Objet: [GSOC] July 6 is mid-term evaluations > À: mahout-dev@lucene.apache.org > Date: Mercredi 17 Juin 2009, 15h54

Re: [jira] Updated: (MAHOUT-123) Implement Latent Dirichlet Allocation

2009-07-07 Thread Jeff Eastman
My bad. Once I ran mvn install the 1.2 version was downloaded into my repository. I should have noticed the pom was modified by the patch. David Hall wrote: On Tue, Jul 7, 2009 at 8:04 AM, Jeff Eastman wrote: David Hall (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT

Re: [jira] Updated: (MAHOUT-123) Implement Latent Dirichlet Allocation

2009-07-07 Thread David Hall
On Tue, Jul 7, 2009 at 8:04 AM, Jeff Eastman wrote: > David Hall (JIRA) wrote: >> >>     [ >> https://issues.apache.org/jira/browse/MAHOUT-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> ] >> >> David Hall updated MAHOUT-123: >> -- >> >>    Attac

Re: Re : [GSOC] July 6 is mid-term evaluations

2009-07-07 Thread Ted Dunning
I filled out one for Deneche. On Tue, Jul 7, 2009 at 9:32 AM, deneche abdelhakim wrote: > > The students mid-term survey is available online. I'm posting this because > I almost forgot it =P > > --- En date de : Mer 17.6.09, Grant Ingersoll a > écrit : > > > De: Grant Ingersoll > > Objet: [GSOC

[jira] Commented: (MAHOUT-124) Online Classification using HBase

2009-07-07 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728311#action_12728311 ] Isabel Drost commented on MAHOUT-124: - Some initial comments on the patch: org/apache/

Re: [GSOC] July 6 is mid-term evaluations

2009-07-07 Thread Isabel Drost
On Tuesday 07 July 2009 20:34:09 Ted Dunning wrote: > I filled out one for Deneche. I submitted the one for Robin yesterday evening. Isabel -- QOTD: Produtos desenvolvidos para todo tipo de idiota * Impresso no fundo, embaixo, de uma sobremesa tiramisudo Tesco: ``N�o vire de ponta cabe�a.''

Re: About TestClusters of Hadoop

2009-07-07 Thread Isabel Drost
On Tuesday 07 July 2009 15:50:05 Robin Anil wrote: > If any test cluster is available for mahout developers, i would certainly > like to get my hands on it for some time. So would others on the list. Committers do get credits for Amazon EC2 - however I wonder whether there is anything we can do f

[jira] Commented: (MAHOUT-124) Online Classification using HBase

2009-07-07 Thread stack (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728325#action_12728325 ] stack commented on MAHOUT-124: -- Here's a few small comments Robin: In HybridCache, you import

Code quality enforcement?

2009-07-07 Thread Benson Margulies
In CXF and XMLSchema, we use both checkstyle and eclipse to enforce coding style and standards. There is technology to run this stuff from maven and to cause the maven-eclipse-plugin to configure it into eclipse. There's no telling if the exact set of style rules in use there will seem attractive

Re: Code quality enforcement?

2009-07-07 Thread Sean Owen
I am for it. I think most people are here too. The problem is, as ever, which standards to choose. Maybe we can figure out where there is agreement, enforce that, and not sweat smaller stuff (or add it over time). For example I might suggest we bother drawing up guidelines on: - Line length (max

Re: Code quality enforcement?

2009-07-07 Thread Ted Dunning
I could help knock down some of these. With IntelliJ it goes pretty quickly. On Tue, Jul 7, 2009 at 1:14 PM, Benson Margulies wrote: > The first time this is tried, it will call forth a big raft of picky > complaints. >

Re: Code quality enforcement?

2009-07-07 Thread Benson Margulies
I could make a patch that has the checkstyle and PMD stuff in an optional profile. You could apply it and see what you think of the results. On Tue, Jul 7, 2009 at 4:58 PM, Ted Dunning wrote: > I could help knock down some of these.  With IntelliJ it goes pretty > quickly. > > On Tue, Jul 7, 2009

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-07-07 Thread M. Arshad Khan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728466#action_12728466 ] M. Arshad Khan commented on MAHOUT-65: -- The patch seems to work for me. However, I get

Re: Code quality enforcement?

2009-07-07 Thread Shalin Shekhar Mangar
On Wed, Jul 8, 2009 at 2:04 AM, Sean Owen wrote: > I am for it. I think most people are here too. The problem is, as > ever, which standards to choose. Maybe we can figure out where there > is agreement, enforce that, and not sweat smaller stuff (or add it > over time). > > For example I might su