[jira] [Commented] (MAHOUT-586) Redo RecommenderEvaluator for modularity

2011-03-23 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010571#comment-13010571 ] Lance Norskog commented on MAHOUT-586: -- About the TODOs. Yes, I agree. I would prefer

[jira] [Issue Comment Edited] (MAHOUT-586) Redo RecommenderEvaluator for modularity

2011-03-23 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010566#comment-13010566 ] Lance Norskog edited comment on MAHOUT-586 at 3/24/11 5:54 AM: -

[jira] [Commented] (MAHOUT-586) Redo RecommenderEvaluator for modularity

2011-03-23 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010566#comment-13010566 ] Lance Norskog commented on MAHOUT-586: -- (The MAHOUT-586 of March 15 has the SamplingD

Re: movielens 1M example

2011-03-23 Thread Lance Norskog
There are other datasets on Infochimps and Factual which are freely downloadable. http://www.infochimps.com/ http://www.factual.com/ On Wed, Mar 23, 2011 at 1:20 PM, Grant Ingersoll wrote: > I've asked them in the past as well if it could be scripted in the past and > the answer was no. > > >

[jira] [Updated] (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-03-23 Thread Timothy Potter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated MAHOUT-588: -- Attachment: MAHOUT-588.patch Patch file for trunk > Benchmark Mahout's clustering performance

[jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-03-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010530#comment-13010530 ] Hudson commented on MAHOUT-542: --- Integrated in Mahout-Quality #690 (See [https://hudson.apa

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-23 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010506#comment-13010506 ] Dmitriy Lyubimov commented on MAHOUT-633: - {quote}It would be great if we also had

[jira] [Assigned] (MAHOUT-570) Make the retrieval of candidate items for the most-similar-items computation customizable

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter reassigned MAHOUT-570: - Assignee: Sebastian Schelter > Make the retrieval of candidate items for the most

[jira] [Assigned] (MAHOUT-558) Extend ItembasedRecommender to offer different "exclusion modes" when computing most similar items to a collection of input items

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter reassigned MAHOUT-558: - Assignee: Sebastian Schelter > Extend ItembasedRecommender to offer different "ex

[jira] [Assigned] (MAHOUT-572) Non-distributed implementation of ALS-WR matrix factorization

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter reassigned MAHOUT-572: - Assignee: Sebastian Schelter > Non-distributed implementation of ALS-WR matrix fa

[jira] [Issue Comment Edited] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-23 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010497#comment-13010497 ] Dmitriy Lyubimov edited comment on MAHOUT-633 at 3/23/11 10:39 PM: -

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-23 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010497#comment-13010497 ] Dmitriy Lyubimov commented on MAHOUT-633: - if you want to be true to the Hadoop co

[jira] [Updated] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-542: -- Fix Version/s: 0.5 Status: Patch Available (was: In Progress) > MapReduce i

[jira] [Updated] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-542: -- Resolution: Fixed Status: Resolved (was: Patch Available) > MapReduce implemen

[jira] [Work started] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAHOUT-542 started by Sebastian Schelter. > MapReduce implementation of ALS-WR > -- > > Key: MAHOUT-542 >

Re: General question about FileSystem.makeQualified()

2011-03-23 Thread Sean Owen
OK. I am in particular looking at TasteHadoopUtils.readItemIDIndexMap(). Would this ever be fed a composite path like that? I think you're also suggesting that it never hurts to qualify the path. So, the utility class SequenceFileIterable ought to do this. Well I'd rather err on the side of not b

Re: GSOC

2011-03-23 Thread Harsh
Well, how I saw it was, if we simply do normal word count, we wont get the full picture as the pronouns present in the paragraph wont add up to increase the importance of a particular word. Suppose, we are talking about a person X and after the first sentence if X is referred to by "He" or "She", t

Re: movielens 1M example

2011-03-23 Thread Grant Ingersoll
I've asked them in the past as well if it could be scripted in the past and the answer was no. On Mar 23, 2011, at 2:37 PM, Sean Owen wrote: > I might err on the side of not doing so. I know from talking to the > GroupLens guys a while ago, they (rightly) don't want the data > redistributed fre

[jira] [Commented] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010310#comment-13010310 ] Sebastian Schelter commented on MAHOUT-633: --- looks very useful to me! It would

Re: General question about FileSystem.makeQualified()

2011-03-23 Thread Sebastian Schelter
Those code pieces are from me and they were necessary to make combined pathes like this work on S3 for me: Path combined = new Path(pathA + "," + pathB) It's been a quick (and somewhat ugly) workaround, if someone knows a better solution I'd be happy to see it refactored. --sebastian On 23

General question about FileSystem.makeQualified()

2011-03-23 Thread Sean Owen
I'm seeing a lot of code that goes out of its way to make a Path in Hadoop fully-qualified. It ends up taking a few lines of code. I suspect some of it is spurious. I'm trying to confirm my understanding of when you would need a fully-qualified path. This seems to be necessary in general when send

[jira] [Updated] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-23 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-633: - Attachment: MAHOUT-633.patch > Add SequenceFileIterable; put Iterable stuff in one place > --

Re: GSOC

2011-03-23 Thread Ted Dunning
On Wed, Mar 23, 2011 at 11:18 AM, Harsh wrote: > And, to say in strict terms, it is more a computation theory work. It isn't > totally Mahout-sh, as Ted said. The reason I wanted to do it with Mahout is > this could be generalized and taken and implemented to cloud. This would > improve the searc

[jira] [Created] (MAHOUT-633) Add SequenceFileIterable; put Iterable stuff in one place

2011-03-23 Thread Sean Owen (JIRA)
Add SequenceFileIterable; put Iterable stuff in one place - Key: MAHOUT-633 URL: https://issues.apache.org/jira/browse/MAHOUT-633 Project: Mahout Issue Type: Improvement Compo

Re: movielens 1M example

2011-03-23 Thread Sean Owen
I might err on the side of not doing so. I know from talking to the GroupLens guys a while ago, they (rightly) don't want the data redistributed freely as they have certain obligations to notify users of terms and such. A shell script may or may not be against the letter of the license, but seems a

movielens 1M example

2011-03-23 Thread Sebastian Schelter
Hi there, with committing MAHOUT-542, I'd like to add a shell-script to our examples that downloads the Movielens 1M dataset and factorizes it. Are there any legal issues I have to consider? --sebastian

[jira] [Commented] (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-03-23 Thread Timothy Potter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010243#comment-13010243 ] Timothy Potter commented on MAHOUT-588: --- Szymon, I'll create a patch for the follow

[jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-03-23 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010212#comment-13010212 ] Ted Dunning commented on MAHOUT-542: Progress always sounds good to me. It is even be

Re: GSOC

2011-03-23 Thread Ted Dunning
Another important question is whether this is something that is Mahout-ish. Mahout is a project that supports scalable data mining. That currently includes a mature recommendation framework, less mature clustering and classification tools and a smattering of other tools. What you are proposing s

Re: GSOC

2011-03-23 Thread Ted Dunning
Let's take this back to the mailing list so all can see. If you are familiar with the stanford parser, then this seems like a feasible project for you to accomplish. I would expect that very similar results could be achieved using simple word or phrase counts, possibly with the addition of a chun

[jira] [Commented] (MAHOUT-590) add TSV (Tab Separate Value) input file support to SequenceFilesFromDirectory

2011-03-23 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010182#comment-13010182 ] Hudson commented on MAHOUT-590: --- Integrated in Mahout-Quality #688 (See [https://hudson.apa

[jira] [Commented] (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-03-23 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010173#comment-13010173 ] Grant Ingersoll commented on MAHOUT-588: See https://cwiki.apache.org/confluence/d

[jira] [Commented] (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-03-23 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010172#comment-13010172 ] Grant Ingersoll commented on MAHOUT-588: Tim or Syzmon, Can you put the code to b

[jira] [Resolved] (MAHOUT-590) add TSV (Tab Separate Value) input file support to SequenceFilesFromDirectory

2011-03-23 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost resolved MAHOUT-590. - Resolution: Fixed Fix Version/s: 0.5 Patch committed. > add TSV (Tab Separate Value) inpu

[jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-03-23 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010033#comment-13010033 ] Sebastian Schelter commented on MAHOUT-542: --- I think so too, we should commit th