Re: Undergrad stud interested in GSoC

2008-03-30 Thread Isabel Drost
On Monday 31 March 2008, Jeff Eastman wrote: > I think we can refer to external datasets in our documentation and load > them on demand when we run against them. That way we do not have to store > them either. So I guess, we should just come up with a list of dataset that are interesting to us.

Re: SoC proposal

2008-03-30 Thread Isabel Drost
On Sunday 30 March 2008, Rodrigo Tripodi wrote: > I've chosen to implement one clustring and one classification algorithm, a > priori the EM and SVM algorithms. There is a patch still in JIRA (Mahout-4) that contains a simple EM prototype. It is still non-parallel and could be polished. But maybe

Re: Fast Feather Track

2008-03-30 Thread Isabel Drost
I have added a pdf version for those that do not have oo: http://www.isabel-drost.de/mahout_fast_feather.pdf This evening, I will add the missing content of the "Problem setting" slide and refactor the "Who we are" slide with your pictures and the missing names. Isabel -- Most people want eit

[jira] Updated: (MAHOUT-22) Several matrix exceptions are checked exceptions, but should be unchecked

2008-03-30 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-22: -- Attachment: MAHOUT-22.patch Here are the trivial changes. > Several matrix exceptions are checked excep

[jira] Updated: (MAHOUT-21) Need reference implementation of Evolutionary Programming

2008-03-30 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-21: -- Attachment: MAHOUT-21.patch This is a draft of a continuous variable EP optimizer based on "Recorded Ste

[jira] Created: (MAHOUT-22) Several matrix exceptions are checked exceptions, but should be unchecked

2008-03-30 Thread Ted Dunning (JIRA)
Several matrix exceptions are checked exceptions, but should be unchecked - Key: MAHOUT-22 URL: https://issues.apache.org/jira/browse/MAHOUT-22 Project: Mahout Issue Typ

[jira] Created: (MAHOUT-21) Need reference implementation of Evolutionary Programming

2008-03-30 Thread Ted Dunning (JIRA)
Need reference implementation of Evolutionary Programming - Key: MAHOUT-21 URL: https://issues.apache.org/jira/browse/MAHOUT-21 Project: Mahout Issue Type: New Feature Repor

RE: Undergrad stud interested in GSoC

2008-03-30 Thread Jeff Eastman
I think we can refer to external datasets in our documentation and load them on demand when we run against them. That way we do not have to store them either. Jeff Jeff Eastman, Ph.D. Windward Solutions Inc. +1.415.298.0023 http://windwardsolutions.com http://jeffeastman.blogspot.com > -O

Re: SoC Naive Bayes Implementation

2008-03-30 Thread Ted Dunning
Both good points. On 3/30/08 3:38 PM, "Paul Elschot" <[EMAIL PROTECTED]> wrote: > Op Sunday 30 March 2008 20:51:40 schreef Ted Dunning: >> I am sure that the entire Mahout community will be happy to help. >> >> You may find, however, that naïve Bayes is trivially parallel (and >> not very diff

Re: GSoC proposal

2008-03-30 Thread Yun Jiang
Thanks. If I can't finish the whole project in summer which I'll definitely try, then I'll manage to finish after GSoC. On Mon, Mar 31, 2008 at 4:20 AM, Isabel Drost <[EMAIL PROTECTED]> wrote: > On Sunday 30 March 2008, Ted Dunning wrote: > > This is an excellent proposal. It might be a little b

Re: SoC proposal

2008-03-30 Thread Rodrigo Tripodi
ok. thank you. 2008/3/30, Grant Ingersoll <[EMAIL PROTECTED]>: > > Sounds reasonable. Make sure you include information on timelines, > bio, etc. There are many emails in the archive discussing various > aspects of GSOC. > > Good luck, > > Grant > > > On Mar 30, 2008, at 5:20 PM, Rodrigo Tripodi

Re: SoC Naive Bayes Implementation

2008-03-30 Thread Paul Elschot
Op Sunday 30 March 2008 20:51:40 schreef Ted Dunning: > I am sure that the entire Mahout community will be happy to help. > > You may find, however, that naïve Bayes is trivially parallel (and > not very difficult even without parallelism). That means you may > want to have something additional to

Re: SoC proposal

2008-03-30 Thread Grant Ingersoll
Sounds reasonable. Make sure you include information on timelines, bio, etc. There are many emails in the archive discussing various aspects of GSOC. Good luck, Grant On Mar 30, 2008, at 5:20 PM, Rodrigo Tripodi wrote: Hello everybody, I know it's a little bit late, but I'm really excit

Re: Fast Feather Track

2008-03-30 Thread Ted Dunning
See here for a picture of me: http://www.veoh.com/users/ted On 3/30/08 1:29 PM, "Isabel Drost" <[EMAIL PROTECTED]> wrote: > > Hello, > > my proposal for presenting our project at the Fast Feather session at Apache > Con EU was accepted. > > I am currently about to prepare the slides for my t

SoC proposal

2008-03-30 Thread Rodrigo Tripodi
Hello everybody, I know it's a little bit late, but I'm really excited to submit a google summer of code proposal for the Mahout project. I've read there is already a k-means implementation, so I've decided to implement another algorithm. I've chosen to implement one clustring and one classificati

Fast Feather Track

2008-03-30 Thread Isabel Drost
Hello, my proposal for presenting our project at the Fast Feather session at Apache Con EU was accepted. I am currently about to prepare the slides for my talk. I would like to include one slide on the project members that were so crazy to start all this half a year ago. It would be nice if I

Re: Undergrad stud interested in GSoC

2008-03-30 Thread Isabel Drost
On Sunday 30 March 2008, Jeff Eastman wrote: > I'm working with my colleagues at CollabNet who have expressed interest in > providing us some EC2 time for this sort of testing. Sounds great to me. > They are working on EC2 deployment of Hadoop using their CUBiT machine > allocation environment a

Re: GSoC proposal

2008-03-30 Thread Isabel Drost
On Sunday 30 March 2008, Ted Dunning wrote: > This is an excellent proposal. It might be a little bit ambitious for a > summer, but it is nicely separated so that partial success will stand > alone. +1 -- They are called computers simply because computation is the only significant job that has

[jira] Commented: (MAHOUT-20) Migrate Canopy and KMeans Implementations to Vectors

2008-03-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583481#action_12583481 ] Isabel Drost commented on MAHOUT-20: I have already done some migration for the distance

[jira] Assigned: (MAHOUT-20) Migrate Canopy and KMeans Implementations to Vectors

2008-03-30 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost reassigned MAHOUT-20: -- Assignee: Isabel Drost (was: Jeff Eastman) > Migrate Canopy and KMeans Implementations to Vecto

[jira] Created: (MAHOUT-20) Migrate Canopy and KMeans Implementations to Vectors

2008-03-30 Thread Jeff Eastman (JIRA)
Migrate Canopy and KMeans Implementations to Vectors Key: MAHOUT-20 URL: https://issues.apache.org/jira/browse/MAHOUT-20 Project: Mahout Issue Type: Task Components: Clustering

Re: SoC Naive Bayes Implementation

2008-03-30 Thread Ted Dunning
I am sure that the entire Mahout community will be happy to help. You may find, however, that naïve Bayes is trivially parallel (and not very difficult even without parallelism). That means you may want to have something additional to work on in the back of your mind. On 3/30/08 6:50 AM, "Vit

Re: GSoC proposal

2008-03-30 Thread Ted Dunning
This is an excellent proposal. It might be a little bit ambitious for a summer, but it is nicely separated so that partial success will stand alone. I would be happy to help mentor on this, as I expect would most of the Mahout community. On 3/30/08 4:41 AM, "Yun Jiang" <[EMAIL PROTECTED]> wro

[jira] Updated: (MAHOUT-15) Investigate Mean Shift Clustering

2008-03-30 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman updated MAHOUT-15: --- Attachment: MAHOUT-15e.patch This patch has improved javadoc comments and removes some debugging code.

RE: Undergrad stud interested in GSoC

2008-03-30 Thread Jeff Eastman
I'm working with my colleagues at CollabNet who have expressed interest in providing us some EC2 time for this sort of testing. They are working on EC2 deployment of Hadoop using their CUBiT machine allocation environment and the quid pro quo would be that we help them exercise this tool. We have n

SoC proposal

2008-03-30 Thread Rodrigo Tripodi
Hello everybody, I know it's a little bit late, but I'm really excited to submit a google summer of code proposal for the Mahout project. I've read there is already a k-means implementation, so I've decided to implement another algorithm. I've chosen to implement one clustring and one classificati

Re: SoC Naive Bayes Implementation

2008-03-30 Thread Grant Ingersoll
Hi Natallia, Have a look at https://issues.apache.org/jira/browse/MAHOUT-9. I am hoping to have something to put up after ApacheCon Europe, at which point testing, help would be appreciated, so I am not sure it will make sense for a GSOC project or not. Perhaps you would be interested i

SoC Naive Bayes Implementation

2008-03-30 Thread Vitalisova, Natallia
Hi, My name is Natallia Vitalisova and I've applied for Google SoC 2008 to implement the Naïve Bayes algorithm on Hadoop. Either I will be accepted for SoC or not, I want to spend my time investigating this topic which I consider to be very interesting. But I will certainly need a mentor

Re: Thoughts on GSOC

2008-03-30 Thread Grant Ingersoll
Doesn't sound like you need a mentor :-) I'd just start by picking something you are interested in and is useful for you and work on it and submit a patch. Consider the community to be the mentor. Just feel free to ask questions and put up patches. Patches don't have to be perfect, they

GSoC proposal

2008-03-30 Thread Yun Jiang
Hi, Here is my proposal. Hope you can give me some advice. Thanks a lot! *Overview* Among those ten machine learning algorithms mentioned by Cheng-Tao Chu et al.[1], I'm really interested in Logistic Regression(LR). I would like to implement a LR program hadoop which can classify both binary and m

Re: GSOC

2008-03-30 Thread Isabel Drost
On Saturday 29 March 2008, Ted Dunning wrote: > SVM is not the only solution to these problems. For many search engine > applications, it isn't even likely to be the best. Regularized logistic > regression is a strong candidate as are random forests and boosted trees. There have been several int

Re: Undergrad stud interested in GSoC

2008-03-30 Thread Isabel Drost
On Saturday 29 March 2008, Samee Zahur wrote: > Being an undergrad student interested in the field of data-intensive machine > learning techniques and applications, I am interested in implementing these > algorithms as a way of getting an exposure into this field. Great. Nice to have you here. >

Re: GSOC

2008-03-30 Thread Isabel Drost
On Sunday 30 March 2008, you wrote: > This is my application, give me feedback, please. Sorry, I am having a slow network connection now and made the mistake to start answering mails before everything was here. I saw your extended application only after replying to your initial mail :( Isabel

Re: Thoughts on GSOC

2008-03-30 Thread Isabel Drost
On Saturday 29 March 2008, Grant Ingersoll wrote: > Finally, I would certainly like to encourage those who don't get > selected to stick around and contribute. +1 from me. In addition to what Grant already said, it is a great experience to see your code end up in an Apache project. Isabel --

Re: GSOC

2008-03-30 Thread Marko Novakovic
> I think it would be great if you could add a little > more information to your > application - if you have not already done so in the > GSoC web form. Some > ideas of useful information, that can help us judge > your application: > > - your background > - your reason for applying to do this

Re: LR in GSoC.Mahout

2008-03-30 Thread Isabel Drost
On Saturday 29 March 2008, Ted Dunning wrote: > The basic outline is to set up a Jira request for enhancement that > describes what you want to do, write or find a sequential version for > reference and then start on the actual coding. If you want to get funding for your project from Google, you m

Re: GSOC

2008-03-30 Thread Isabel Drost
On Saturday 29 March 2008, Marko Novakovic wrote: > I apply for SVM algorithm at Hadoop platform. > I hope that I will be accepted by Google and Appache, > I am serious in intention to do this jos as great. I think it would be great if you could add a little more information to your application -