Re: Random Selection Algorithm Problem in org.apache.mahout.clustering.kmeans.RandomSeedGenerator

2011-12-15 Thread Ted Dunning
Lijie, You are correct. This code is in error. The mailing list lost your coloring, but your point is still there. I think that the code should be this instead. Ironically, the comment in the original code describes what the code does accurately. int itemsSeenSoFar = 0; for (Pair record : ne

Random Selection Algorithm Problem in org.apache.mahout.clustering.kmeans.RandomSeedGenerator

2011-12-15 Thread Lijie Xu
Hi, I'm now reading the source code of "org.apache.mahout.clustering.kmeans.RandomSeedGenerator". There may be a problem in function "buildRandom" which aims to select the random k centroid vectors from streaming records. I'm wondering whether this algorithm is correct and I think the right al

[jira] [Commented] (MAHOUT-825) Canopies grouping records outside t1

2011-12-15 Thread Paritosh Ranjan (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170751#comment-13170751 ] Paritosh Ranjan commented on MAHOUT-825: :). Now I think that stop distance > T1 i

[jira] [Commented] (MAHOUT-825) Canopies grouping records outside t1

2011-12-15 Thread Jeff Eastman (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170702#comment-13170702 ] Jeff Eastman commented on MAHOUT-825: - Hmn, recent major reformatting has invalidated

[jira] [Commented] (MAHOUT-904) SplitInput should support randomizing the input

2011-12-15 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170688#comment-13170688 ] jirapos...@reviews.apache.org commented on MAHOUT-904: --

Re: Review Request: Support for Randomizing Input in SplitInput Class

2011-12-15 Thread Raphael Cendrillon
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3092/ --- (Updated 2011-12-16 02:01:25.825802) Review request for mahout, Ted Dunning, lan

[jira] [Commented] (MAHOUT-904) SplitInput should support randomizing the input

2011-12-15 Thread Raphael Cendrillon (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170677#comment-13170677 ] Raphael Cendrillon commented on MAHOUT-904: --- At the moment this is written only

[jira] [Commented] (MAHOUT-904) SplitInput should support randomizing the input

2011-12-15 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170671#comment-13170671 ] jirapos...@reviews.apache.org commented on MAHOUT-904: --

Build failed in Jenkins: Mahout-Quality #1255

2011-12-15 Thread Apache Jenkins Server
See -- [...truncated 166065 lines...] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.022 sec Running org.apache.mahout.classifier.sgd.ModelSerializerTest Tests run: 4, Failures: 0, Error

Backlog JIRAs NPE?

2011-12-15 Thread Jeff Eastman
Everything on that page is broken for me. The other releases seem to work. Is it repeatable? Here's an example: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+MAHOUT+AND+fixVersion+%3D+Backlog+AND+resolution+%3D+Unresolved+AND+priority+%3D+Major+ORDER+

[jira] [Updated] (MAHOUT-825) Canopies grouping records outside t1

2011-12-15 Thread Jeff Eastman (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman updated MAHOUT-825: Issue Type: Improvement (was: Bug) Changing this from a defect to an improvement. I'd still like t

Release 0.6 And Beyond

2011-12-15 Thread Jeff Eastman
I've just added a new 0.7 release to the JIRA and would like to encourage new JIRA reporters to consider this release vs. 0.6 going forward. We are targeting a New Year's code freeze for 0.6 and I suggest that we concentrate on closing existing defects for the remainder of the year. In addition

[jira] [Updated] (MAHOUT-840) Decision Forests should support Regression problems

2011-12-15 Thread Ikumasa Mukai (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ikumasa Mukai updated MAHOUT-840: - Attachment: MAHOUT-840-additional.patch Hi Hakim-san. I made a patch for makeing the decisionTre

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170243#comment-13170243 ] Sean Owen commented on MAHOUT-906: -- OK shall I wait for a complete patch?

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Anatoliy Kats (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170217#comment-13170217 ] Anatoliy Kats commented on MAHOUT-906: -- Also, I feel like my class names are unnecess

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Anatoliy Kats (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170213#comment-13170213 ] Anatoliy Kats commented on MAHOUT-906: -- Ah, I see what you mean. That would be neat

[jira] [Updated] (MAHOUT-927) FPG saves a mapping from from feature to mining group, when this can be calculated on the fly

2011-12-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-927: -- Status: Patch Available (was: Open) > FPG saves a mapping from from feature to mining group, when

[jira] [Updated] (MAHOUT-927) FPG saves a mapping from from feature to mining group, when this can be calculated on the fly

2011-12-15 Thread tom pierce (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tom pierce updated MAHOUT-927: -- Attachment: MAHOUT-927.patch This patch assumes MAHOUT-920 and MAHOUT-921 have already been applied.

[jira] [Created] (MAHOUT-927) FPG saves a mapping from from feature to mining group, when this can be calculated on the fly

2011-12-15 Thread tom pierce (Created) (JIRA)
FPG saves a mapping from from feature to mining group, when this can be calculated on the fly - Key: MAHOUT-927 URL: https://issues.apache.org/jira/browse/MAHOUT-927

[jira] [Updated] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-906: - Attachment: MAHOUT-906.patch This is a sketch of what I had in mind. It is lacking the implementation but

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170154#comment-13170154 ] Sean Owen commented on MAHOUT-906: -- In both cases you have data to put into a model and s

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Anatoliy Kats (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170151#comment-13170151 ] Anatoliy Kats commented on MAHOUT-906: -- I don't quite see how. We are not exactly sp

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170141#comment-13170141 ] Sean Owen commented on MAHOUT-906: -- Yes that's a good start. Do you think it's possible a

[jira] [Updated] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Anatoliy Kats (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoliy Kats updated MAHOUT-906: - Attachment: MAHOUT-906.patch Passes GenericRecommenderIRStatsEvaluatorImplTest, didn't run the en

[jira] [Updated] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Anatoliy Kats (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoliy Kats updated MAHOUT-906: - Status: Patch Available (was: Open) > Allow collaborative filtering evaluators to use custom

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170119#comment-13170119 ] Sean Owen commented on MAHOUT-906: -- OK, sounds like you want to replace more logic, but t

Re: Comparison of execution times for recommenders

2011-12-15 Thread Sean Owen
My guess is that some defaults changed somewhere... but I can't think of anything relevant to these implementations. Alejandro do you have the ability to run a quick profiling of both, and point out where the new bottleneck is? then we would have better ideas. Also I suggest you use the latest code

Re: Comparison of execution times for recommenders

2011-12-15 Thread Alejandro Bellogin Kouki
Thanks Sebastian. This is the code I used for Mahout-0.3 and Mahout-0.5: int N = 1000; long totalTime = 0L; int n = 0; Recommender rec = null; final DataModel train = new FileDataModel(new File(trainFile)); ItemSimilarity sim = new PearsonCorrelationSimi

Re: Comparison of execution times for recommenders

2011-12-15 Thread Sebastian Schelter
Hi Alejandro, you have to provide a detailed description of your benchmark. There is no such thing is "the efficiency" of a recommender. Mahout offers a wide variety of implementations and components that can be glued together to form a recommender. There are a lot of knobs to adjust that offer t

Comparison of execution times for recommenders

2011-12-15 Thread Alejandro Bellogin Kouki
Hi all, some months ago I performed some efficiency comparisons between the execution times of one implementation of mine and user- and item-based CF recommenders in Mahout. By that time, I was using Mahout-0.3 and I obtained some decent values, taking into account that I was measuring the av

[jira] [Commented] (MAHOUT-906) Allow collaborative filtering evaluators to use custom logic in splitting data set

2011-12-15 Thread Anatoliy Kats (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170023#comment-13170023 ] Anatoliy Kats commented on MAHOUT-906: -- OK, I think I got it(again). We need to fact