[gsoc] random forests

2009-03-15 Thread deneche abdelhakim
I added a page to the wiki that describes how to build a random forest and how to use it to classify new cases. http://cwiki.apache.org/confluence/display/MAHOUT/Random+Forests

Re: Dirichlet Process in 0.1?

2009-03-15 Thread Grant Ingersoll
GSON is in the Maven library: http://repo2.maven.org/maven2/com/google/code/gson/ -Grant On Mar 14, 2009, at 12:40 PM, Jeff Eastman wrote: Please take a look at the screen shot in MAHOUT-30 and tell me if you think this is good enough for 0.1. If we are still working through the release pr

Re: Concerns about Maven

2009-03-15 Thread Sean Owen
It's not that I don't understand Maven, though I am no expert. The maven build has never worked for me, even mvn compile, and it seems to be because the project somehow depends on itself in the repo? Reason: Cannot find parent: org.apache.mahout:mahout-parent for project: org.apache.mahout:mahout-

Re: Concerns about Maven

2009-03-15 Thread Grant Ingersoll
On Mar 15, 2009, at 10:33 AM, Sean Owen wrote: It's not that I don't understand Maven, though I am no expert. The maven build has never worked for me, even mvn compile, and it seems to be because the project somehow depends on itself in the repo? Reason: Cannot find parent: org.apache.mahout:m

Re: Concerns about Maven

2009-03-15 Thread Sean Owen
Ah OK. This finally succeeded after 15 minutes. I see a ton of messages like this: ... [WARNING] Entry: mahout-0.2-SNAPSHOT/examples/examples/src/test/java/org/apache/mahout/ga/watchmaker/cd/utils/RandomRuleResults.java longer than 100 characters. ... [INFO] mahout-0.2-SNAPSHOT/examples/mahout-exa

Re: Concerns about Maven

2009-03-15 Thread Grant Ingersoll
On Mar 15, 2009, at 2:57 PM, Sean Owen wrote: Ah OK. This finally succeeded after 15 minutes. I see a ton of messages like this: ... [WARNING] Entry: mahout-0.2-SNAPSHOT/examples/examples/src/test/java/org/apache/ mahout/ga/watchmaker/cd/utils/RandomRuleResults.java longer than 100 character

Re: Concerns about Maven

2009-03-15 Thread Grant Ingersoll
OK, I think I have a tarball that is in pretty good shape now. I think I was overthinking the assembly a bit too much. -Grant On Mar 15, 2009, at 3:30 PM, Grant Ingersoll wrote: On Mar 15, 2009, at 2:57 PM, Sean Owen wrote: Ah OK. This finally succeeded after 15 minutes. I see a ton of

Re: Dirichlet Process in 0.1?

2009-03-15 Thread Jeff Eastman
I'm aware that Gson is in the Maven repo but don't know what might need to be done to our build to reference it. Does this mean you are ok with me committing MAHOUT-30? Jeff Grant Ingersoll wrote: GSON is in the Maven library: http://repo2.maven.org/maven2/com/google/code/gson/ -Grant On

[jira] Updated: (MAHOUT-30) dirichlet process implementation

2009-03-15 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman updated MAHOUT-30: --- Attachment: MAHOUT-30f.patch Final patch file is ready to commit. Need to add entry to pom for gson ja

Re: Concerns about Maven

2009-03-15 Thread Sean Owen
On Sun, Mar 15, 2009 at 7:30 PM, Grant Ingersoll wrote: > The first time is often slower than the rest. ~5 minutes the second time which is a lot better indeed -- not something I want to run except when releasing but indeed that's mostly what it's for. > I point IntelliJ at my POM and say "Impo

[jira] Updated: (MAHOUT-30) dirichlet process implementation

2009-03-15 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman updated MAHOUT-30: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 754797. Committe

Re: Concerns about Maven

2009-03-15 Thread Erik Hatcher
My $0.02 from the sidelines, Maven sucks as a *build* tool. I dislike it with a passion, or rather, I'm all about Ant myself. Regarding Mahout, I think it makes sense to let Sean to set up and maintain his area with Ant and simply have Maven call to it for packaging. I'm not active with M

Re: [gsoc] random forests

2009-03-15 Thread Ted Dunning
Here is an interesting related paper that gives some good pointers for testing (and an alternative related approach) http://www-stat.wharton.upenn.edu/~edgeorge/Research_papers/*BART* %206--06.pdf or http://www-stat.

Re: [gsoc] random forests

2009-03-15 Thread Ted Dunning
Nice writeup. One thing that I was confused about for a long time is whether the choice of variables to use for splits is chosen once per tree or again at each split. I think that the latter interpretation is actually the correct one. You should check my thought. On Sun, Mar 15, 2009 at 1:53 AM

Re: Concerns about Maven

2009-03-15 Thread Ted Dunning
It (the simple build as per the wiki) has worked just fine for me. It also set IDEA up so I could work with the sources easily. On Sun, Mar 15, 2009 at 10:47 AM, Grant Ingersoll wrote: > To be honest the Maven build looks all over the place to me and >> several times bigger than the Ant scripts,

Re: Concerns about Maven

2009-03-15 Thread Ted Dunning
I am typically a cave man in this sort of thing as well. But maven does work well for me. I think that the difference is that part of my caveman-ism is that I don't care to do the 10% of having a fancy build that Grant refers to as causing the pain. I would rather simplify my build process and g