[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-08 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763378#action_12763378 ] Isabel Drost commented on MAHOUT-138: - From the classes above, I worked through up to

[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763387#action_12763387 ] Sean Owen commented on MAHOUT-186: -- I think I agree, if you're suggesting this is probably

[jira] Updated: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-186: - Attachment: MAHOUT-186.patch Classifier PriorityQueue returns erroneous results

[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763388#action_12763388 ] Sean Owen commented on MAHOUT-138: -- You are most welcome to convert

[jira] Issue Comment Edited: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-08 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763378#action_12763378 ] Isabel Drost edited comment on MAHOUT-138 at 10/8/09 12:15 AM:

Re: [jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-08 Thread Grant Ingersoll
On Oct 8, 2009, at 4:13 AM, deneche abdelhakim wrote: There is also a main() method in: ./examples/src/main/java/org/apache/mahout/ga/watchmaker/cd/CDGA.java I should be able to post a patch saturday concerning CDInfosTool and CDGA. Personally, I'd just commit. The goal here is to

[jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-08 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763455#action_12763455 ] Isabel Drost commented on MAHOUT-138: - Sean: sure, trying to get to it as soon as I

Re: [jira] Commented: (MAHOUT-138) Convert main() methods to use Commons CLI for argument processing

2009-10-08 Thread Isabel Drost
On Thu, 8 Oct 2009 07:52:21 -0300 Grant Ingersoll gsing...@apache.org wrote: Personally, I'd just commit. The goal here is to convert to CLI. It doesn't take a whole lot of review. Only thing I would suggest is we be consistent about what we name arguments, so have a look at the other

Example Datasets

2009-10-08 Thread Robin Anil
We need a central place for all sample datasets used for examples and unit tests? I am against putting it in the repo Any suggestions? Robin

Re: Example Datasets

2009-10-08 Thread Robin Anil
Take a look at this repo http://fimi.cs.helsinki.fi/data/ I am specifically talking about the retail and accidents dataset. A modified version of them(comma separated) is being used by me for FPGrowth testing. Webdocs dataset looks good enough for being used for parallel fpgrowth testing.

Re: Example Datasets

2009-10-08 Thread Sean Owen
Several data sets I use have distribution clauses that forbid or complicate redistribution, so not sure I can do that. Of course we should check that on any other data set. On Thu, Oct 8, 2009 at 1:09 PM, Robin Anil robin.a...@gmail.com wrote: We need a central place for all sample datasets used

[jira] Commented: (MAHOUT-186) Classifier PriorityQueue returns erroneous results

2009-10-08 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763491#action_12763491 ] Robin Anil commented on MAHOUT-186: --- Reply to Ted I ran the code, and this is the result

[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-10-08 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-157: -- Attachment: MAHOUT-157-Oct-8.TestedMapReducePipeline.patch Tested the Map/Reduce Pipeline. The result

Re: Example Datasets

2009-10-08 Thread Ted Dunning
For redistributable data, we should definitely lock down a version in our distribution or an associated one. This is true if only to make sure that we don't get surprised by somebody rearranging their web site. For non-redistributable but available data, I think having a download procedure that

LDA for multi label classification was: Mahout Book

2009-10-08 Thread Robin Anil
Posting to the dev list. Great Paper Thanks!. Looks like L-LDA could be used to create some interesting examples. The Paper shows L-LDA could be used to creating word-tag model for accurate tag(s) prediction given a document of words. I will complete reading and tell How much work is need to