Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Jake Mannix
thanks. On Sun, Apr 4, 2010 at 10:40 PM, Sean Owen wrote: > Oh OK I'll revert the change then, didn't know you wanted that. Some > of the other statements could probably go but not worth digging > through it. > > On Mon, Apr 5, 2010 at 6:33 AM, Jake Mannix wrote: > > Umm, I actually depend pret

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Jake Mannix
The distributed SVD is in core, but the base implementations (both SVD variants - Lanczos and GHA) are a) totally not-Hadoop dependent, and b) not dependent on anything else. So I think keeping them in math is the right place for them - then anyone who wants them can have them with only a simple m

[jira] Commented: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-04 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853312#action_12853312 ] Sean Owen commented on MAHOUT-362: -- Agree, I'm taking a first pass to unify this with the

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Ted Dunning
This sounds like good middle ground to me. Besides, real low-level math stuff seems like a leaf dependency to me and thus shouldn't depend on map-reduce. That moves operations like k-means and SVD out of math, but that isn't all that controversial given that is what Mahout is doing .. layering sc

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Sean Owen
Oh OK I'll revert the change then, didn't know you wanted that. Some of the other statements could probably go but not worth digging through it. On Mon, Apr 5, 2010 at 6:33 AM, Jake Mannix wrote: > Umm, I actually depend pretty heavily on the logging in the SVD solvers. >  They are very long-runn

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Robin Anil
SVD shouldn't really be in Math. I agree its "Math" but in principle its a core Mahout algorithm like clustering or recommendations. I know its a very debatable thought but for me collections and Math are just tools to aid complex algorithms in Mahout core. Maybe we can move it under core and addin

[jira] Commented: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-04 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853311#action_12853311 ] Jake Mannix commented on MAHOUT-362: It would be really nice if this code could be exte

Re: svn commit: r930796 - in /lucene/mahout/trunk/math: ./ src/main/java/org/apache/mahout/math/ src/main/java/org/apache/mahout/math/decomposer/hebbian/ src/main/java/org/apache/mahout/math/decompo

2010-04-04 Thread Jake Mannix
Umm, I actually depend pretty heavily on the logging in the SVD solvers. They are very long-running processes, and give off a ton of useful information about what the heck is going on. Reducing dependencies is great, but logging? I think the math stuff could really use logging. I haven't been a

[jira] Commented: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-04 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853310#action_12853310 ] Sean Owen commented on MAHOUT-362: -- Really nice job on the code, very clean. I'm going to

[jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-04 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853308#action_12853308 ] Jake Mannix commented on MAHOUT-363: ... and actually, there is no need for Hama, as di

[jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-04 Thread Shannon Quinn (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853306#action_12853306 ] Shannon Quinn commented on MAHOUT-363: -- Thank you for the feedback! Cutting out Hama

[jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-04 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853303#action_12853303 ] Ted Dunning commented on MAHOUT-363: Nice proposal. Well written and well conceived.

[jira] Commented: (MAHOUT-328) Implement a cool clustering algorithm on map/reduce

2010-04-04 Thread Shannon Quinn (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853302#action_12853302 ] Shannon Quinn commented on MAHOUT-328: -- Please see https://issues.apache.org/jira/brow

[jira] Created: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

2010-04-04 Thread Shannon Quinn (JIRA)
Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout) -- Key: MAHOUT-363 URL: https://issues.apache.org/jira/browse/MAHOUT-363 Project: Mahout Issue Type: Task

Re: Proposal: make collections releases independent of the rest of Mahout

2010-04-04 Thread Benson Margulies
Last question: What's the first version going to be? I propose '1.0'. 0.4 would get mighty confusion. I really don't see the harm in calling it 1.0. On Sat, Apr 3, 2010 at 6:00 PM, Grant Ingersoll wrote: > > On Apr 3, 2010, at 2:22 PM, Benson Margulies wrote: > >> On Sat, Apr 3, 2010 at 2:07 PM,

Re: A request for prospective GSOC students

2010-04-04 Thread Shannon Quinn
Not yet, I'll do that ASAP. Thanks for the heads-up! Shannon On 4/4/2010 3:39 PM, Ted Dunning wrote: Shannon, Have you posted your proposal as a mahout JIRA ticket as well? On Sun, Apr 4, 2010 at 9:11 AM, Shannon Quinn wrote: Hello, I submitted an application last night titled "EigenC

Re: A request for prospective GSOC students

2010-04-04 Thread Ted Dunning
Shannon, Have you posted your proposal as a mahout JIRA ticket as well? On Sun, Apr 4, 2010 at 9:11 AM, Shannon Quinn wrote: > Hello, > > I submitted an application last night titled "EigenCuts spectral clustering > implementation on map/reduce for Apache Mahout". It's not finished yet (the > e

[jira] Updated: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-362: -- Attachment: MAHOUT-362.patch > Computation of pairwise cosine similarities for Item-Bas

[jira] Created: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-04 Thread Sebastian Schelter (JIRA)
Computation of pairwise cosine similarities for Item-Based Collaborative Filtering -- Key: MAHOUT-362 URL: https://issues.apache.org/jira/browse/MAHOUT-362 Project: Mahou

[jira] Updated: (MAHOUT-362) Computation of pairwise cosine similarities for Item-Based Collaborative Filtering

2010-04-04 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-362: -- Status: Patch Available (was: Open) > Computation of pairwise cosine similarities for

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-04-04 Thread Sisir Koppaka
I have put up the processed Netflix dataset here. This file does not contain dates, and is 1.5GB in size.

Re: A request for prospective GSOC students

2010-04-04 Thread Shannon Quinn
Hello, I submitted an application last night titled "EigenCuts spectral clustering implementation on map/reduce for Apache Mahout". It's not finished yet (the entire description is blank) but I wanted to go ahead and get the draft in so you and the other contributors are aware of the idea. I'

Re: Pairwise cosine similarity for item-based cf

2010-04-04 Thread Sean Owen
Sure, that sounds like something I can find a good home for. Create a new issue with the patch here: https://issues.apache.org/jira/secure/CreateIssue!default.jspa?pid=12310751 I'll do the rest. best, Sean On Sun, Apr 4, 2010 at 11:03 AM, Sebastian Schelter wrote: > Hi there, > > My name is Se

Pairwise cosine similarity for item-based cf

2010-04-04 Thread Sebastian Schelter
Hi there, My name is Sebastian, I'm a student and currently writing my diploma thesis about the comparison of several recommendation algorithms for a large german ecommerce site. The algorithms I evaluate include item-based collaborative filtering, what makes me a taste and mahout user. One major

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-04-04 Thread Sisir Koppaka
On Sun, Apr 4, 2010 at 4:10 PM, Sean Owen wrote: > I think you want to write this to accept "generic" data, and not > necessarily assume the Netflix input format. I suggest you accept CSV > data, in the form "userID,itemID,value", since that is what all the > recommenders do. > > Sure, I'll write

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-04-04 Thread Sean Owen
I think you want to write this to accept "generic" data, and not necessarily assume the Netflix input format. I suggest you accept CSV data, in the form "userID,itemID,value", since that is what all the recommenders do. You may need a quick utility program to convert Netflix data format to this. t

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-04-04 Thread Sisir Koppaka
that would *be* utilize - sorry! I'll start off by implementing the distributed Netfflix read-in, if that's OK by you.

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-04-04 Thread Sisir Koppaka
Thanks, this is what I wanted to know. So, now, there would be a separate example that reads-in the Netflix dataset in a distributed way, that would be utilize the RBM implementation. Would that be right? The datastore I was referring to in the proposal was based on mahout.classifier.bayes.datasto

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-04-04 Thread Sean Owen
Reusing code is fine, in principle. The code you mention, however, will not help you much. It is non-distributed and has nothing to do with Hadoop. You might reuse a bit of code to parse the input files, that's about it. Which data store are you referring to... if I understand right, you are imple

Re: Reg. Netflix Prize Apache Mahout GSoC Application

2010-04-04 Thread Sisir Koppaka
Thanks Robin, Ted, Jake and Sean for your feedback. I've refined my proposal, added in a milestone timeline, with design details, and have submitted it at the GSoC site. The title of the proposal is *Restricted Boltzmann Machines on the Netflix Dataset. Please do give me your feedback on the propos