[jira] Updated: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-05 Thread Shannon Quinn (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shannon Quinn updated MAHOUT-363: - Description: Proposal Title: EigenCuts spectral clustering implementation on map/reduce for Apac

[jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-05 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853409#action_12853409 ] Robin Anil commented on MAHOUT-363: --- Hi Shannon, did you take time to explore the Mahout

[jira] Updated: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-362: - Resolution: Fixed Fix Version/s: 0.4 Assignee: Sean Owen Status: Resolved (was

[jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-05 Thread Shannon Quinn (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853415#action_12853415 ] Shannon Quinn commented on MAHOUT-363: -- Hi Robin, thanks for the suggestions! I have s

Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

2010-04-05 Thread Richard Simon Just
Thanks for the super speedy response! Going on from what you said I've been reading up on the different SVD based variants used throughout the Netflix competition and working on my proposal. I'm focussing on what you suggested with aiming purely on the SVD-based recommender with the possibility of

Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

2010-04-05 Thread Jake Mannix
Hi Richard, A few notes about what would be required to get a nice distributed SVD recommender in Mahout: if you look at the current distributed recommenders (in org.apache.mahout.cf.taste.hadoop package and children), you can see how it works: using HDFS-backed data, a batch of recommendations

Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

2010-04-05 Thread Sean Owen
Your audience is the project committers. I wouldn't spend much time rehashing the SVD theory. You should name your approach and I suppose write enough to make it clear you understand the algorithm enough to implement it. In this case you can assume we all understand the SVD well enough already. I

Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

2010-04-05 Thread Necati Batur
*IDEA:Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use * *Summary* **First of all,I am very excited to join an organization like GSOC and most importantly work for a big open source Project apache.I am looking for a good collaboration

Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

2010-04-05 Thread Richard Simon Just
Awesome guys, Thanks for the quick responses! The details and clarifications are both helpful and incredibly reassuring. I've never done a proposal before, but no matter what happens I'm really looking forward to the end of my exams so I can gear into Mahout properly. Many thanks Richard Sean Ow

Re: Problems installing Mahout

2010-04-05 Thread Sean Owen
(BCC mahout-user, moving to mahout-dev) Alrighty I think the subtlety here is that: String.format("%.2f", foo) ... will format numbers in a locale-specific way. I think we need: String.format(Locale.ENGLISH, "%.2f", foo) (Sorry to be English-centric, but we have to pick one locale in order to

[GSOC] Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use *

2010-04-05 Thread Robin Anil
+changing subject line. Hi Necati, Like I mentioned on JIRA ticket, you need to take a look at the current data representation format (Vectors) and how structured data (ARFF format) is converted to vectors. You will find a basic converter in the utils folder under trunk. With regard to NOSQL, the

Re: Mahout GSoC 2010 proposal: Association Mining

2010-04-05 Thread Robin Anil
Hi Lukas, Sorry for being late to getting back to you on this. Association rule mining is a great addition to FPGrowth. I am not sure I understand GUHA method well but then again I understood Ted's LLR after some deep reading. Could you put up an interesting example to help us unders

Re: Mahout GSoC 2010 proposal: Association Mining

2010-04-05 Thread Robin Anil
PS: Current TopK FPGrowth is pretty tightly coupled. But it can be easily refactored out or even a vanilla implementation of FPGrowth is not so difficult to re-create by re-using the existing methods. Robin On Tue, Apr 6, 2010 at 3:29 AM, Robin Anil wrote: > Hi Lukas, >Sorry fo