My ideas for GSoC 2010

2010-03-19 Thread cristi prodan
Dear Mahout community, My name is Cristi Prodan, I'm 23 years old and currently a 2nd year student pursuing a MSc degree in Computer Science. I started studying machine learning in the past year and during my research I found about the Mapreduce model. Then, I discovered hadoop and Maho

Re: My ideas for GSoC 2010

2010-03-23 Thread cristi prodan
duplicate detection could be done. > > Sean's intuitions about project size are generally better than mine, but if > you limit you are strict about not increasing scope, I think you could > succeed with this project. > > On Fri, Mar 19, 2010 at 6:34 AM, cristi prodan &

[jira] Commented: (MAHOUT-344) Minhash based clustering

2010-03-24 Thread Cristi Prodan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849402#action_12849402 ] Cristi Prodan commented on MAHOUT-344: -- I've studied the min-hash algori

[jira] Commented: (MAHOUT-344) Minhash based clustering

2010-03-30 Thread Cristi Prodan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851416#action_12851416 ] Cristi Prodan commented on MAHOUT-344: -- I ran the code on the last.fm data se

[jira] Updated: (MAHOUT-344) Minhash based clustering

2010-04-03 Thread Cristi Prodan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristi Prodan updated MAHOUT-344: - Status: Patch Available (was: Open) Thank you guys for all the encouragement and advices. I&#

[jira] Updated: (MAHOUT-344) Minhash based clustering

2010-04-03 Thread Cristi Prodan (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristi Prodan updated MAHOUT-344: - Attachment: MAHOUT-344-v2.patch See comment above for this patch. > Minhash based cluster

[jira] Created: (MAHOUT-365) [GSoC] Proposal to implement SimHash clustering on MapReduce

2010-04-07 Thread Cristi Prodan (JIRA)
Components: Clustering Reporter: Cristi Prodan Application for Google Summer of Code 2010 - Mahout Project Student: Cristian Prodan 1. Synopsis I will add a map-reduce implementation of the SimHash clustering algorithm to the Mahout project. This algorithm provides an efficient