[jira] [Commented] (MAHOUT-843) Top Down Clustering

2011-11-29 Thread Paritosh Ranjan (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159884#comment-13159884 ] Paritosh Ranjan commented on MAHOUT-843: Another reminder to review this patch.

[jira] [Commented] (MAHOUT-897) New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

2011-11-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159878#comment-13159878 ] jirapos...@reviews.apache.org commented on MAHOUT-897: --

[jira] [Commented] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Dmitriy Lyubimov (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159877#comment-13159877 ] Dmitriy Lyubimov commented on MAHOUT-817: - rolling back solution for now. There ar

[jira] [Updated] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-817: Attachment: (was: SSVD-PCA options.pdf) > Add PCA options to SSVD code > --

[jira] [Updated] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-817: Attachment: (was: SSVD-PCA options.pdf) > Add PCA options to SSVD code > --

[jira] [Updated] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-817: Attachment: (was: SSVD-PCA options.pdf) > Add PCA options to SSVD code > --

Re: Review Request: New implementation for LDA: Collapsed Variational Bayes (0th derivative approximation), with map-side model caching

2011-11-29 Thread Jake Mannix
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2944/ --- (Updated 2011-11-30 07:35:21.576144) Review request for mahout and Ted Dunning.

[jira] [Updated] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-817: Attachment: (was: SSVD-PCA options.pdf) > Add PCA options to SSVD code > --

[jira] [Updated] (MAHOUT-840) Decision Forests should support Regression problems

2011-11-29 Thread Ikumasa Mukai (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ikumasa Mukai updated MAHOUT-840: - Attachment: regression.patch Hi. I made a new patch! > 1. (int) on a double means Math.floor(dou

Re: Soliciting SSVD documentation review

2011-11-29 Thread Dmitriy Lyubimov
ok thanks. I will file an issue for default p. also i updated the docs re: --reduceTasks. it would be nice if you could log time for map and reduce phases for all tasks (it is reported in MR web ui at namenode:50030 by default) in each case if you think there's a performance issue. It would at le

Re: Soliciting SSVD documentation review

2011-11-29 Thread Nathan Halko
Thanks for the heads up with numReduceTasks. I haven't changed the parameters yet much from the default so this is probably my problem. By slave I mean machine, I'm running an m1.small as master and either m1.small's or m1.large's as slaves (datanode, tasktracker, child). p depends mostly on the

[jira] [Updated] (MAHOUT-903) Slope one doesn't write, read diff counts resulting in no recs

2011-11-29 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-903: - Attachment: MAHOUT-903.patch > Slope one doesn't write, read diff counts resulting in no recs > -

[jira] [Updated] (MAHOUT-903) Slope one doesn't write, read diff counts resulting in no recs

2011-11-29 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-903: - Status: Patch Available (was: Open) > Slope one doesn't write, read diff counts resulting in no recs

[jira] [Created] (MAHOUT-903) Slope one doesn't write, read diff counts resulting in no recs

2011-11-29 Thread Sean Owen (Created) (JIRA)
Slope one doesn't write, read diff counts resulting in no recs -- Key: MAHOUT-903 URL: https://issues.apache.org/jira/browse/MAHOUT-903 Project: Mahout Issue Type: Bug Com

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2011-11-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159493#comment-13159493 ] jirapos...@reviews.apache.org commented on MAHOUT-880: --

[jira] [Issue Comment Edited] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

2011-11-29 Thread Dmitriy Lyubimov (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159469#comment-13159469 ] Dmitriy Lyubimov edited comment on MAHOUT-797 at 11/29/11 7:38 PM: -

Re: Soliciting SSVD documentation review

2011-11-29 Thread Dmitriy Lyubimov
PPS also make sure you specify numReduceTasks. Default is I beleive 1 which will not scale at multiplication steps at all. On Tue, Nov 29, 2011 at 10:15 AM, Dmitriy Lyubimov wrote: > PS actually i think it should scale horizontally a little better than > vertically but that's just a guess. > > On

[jira] [Commented] (MAHOUT-797) MapReduce SSVD: provide alternative B-pipeline per B=R' ^{-1} Y'A

2011-11-29 Thread Dmitriy Lyubimov (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159469#comment-13159469 ] Dmitriy Lyubimov commented on MAHOUT-797: - so what i think is that using R'^-1YA r

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2011-11-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159440#comment-13159440 ] jirapos...@reviews.apache.org commented on MAHOUT-880: --

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2011-11-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159439#comment-13159439 ] jirapos...@reviews.apache.org commented on MAHOUT-880: -- bq. On 201

[jira] [Updated] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Dmitriy Lyubimov (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-817: Attachment: SSVD-PCA options.pdf Actually, propagating median thru power iterations is not

Re: Soliciting SSVD documentation review

2011-11-29 Thread Dmitriy Lyubimov
PS actually i think it should scale horizontally a little better than vertically but that's just a guess. On Tue, Nov 29, 2011 at 10:10 AM, Dmitriy Lyubimov wrote: > On Tue, Nov 29, 2011 at 9:56 AM, Nathan Halko > wrote: >> >> The docs look great Dmitriy.  Has anyone considered giving oversampl

Re: Soliciting SSVD documentation review

2011-11-29 Thread Dmitriy Lyubimov
On Tue, Nov 29, 2011 at 9:56 AM, Nathan Halko wrote: > The docs look great Dmitriy.  Has anyone considered giving oversampling > parameter p a default value? Say p = 25.  Slightly high but I imagine most > use cases are noisy and could benefit from the larger value. Yes that's a good idea that di

Re: Soliciting SSVD documentation review

2011-11-29 Thread Dmitriy Lyubimov
On Tue, Nov 29, 2011 at 9:56 AM, Nathan Halko wrote: > > The docs look great Dmitriy.  Has anyone considered giving oversampling > ssvd over lanczos which is promising.  Trying to scale out horizontally but > not seeing any difference between using one slave or many slaves.  Any > ideas? (I won't

Re: Soliciting SSVD documentation review

2011-11-29 Thread Nathan Halko
The docs look great Dmitriy. Has anyone considered giving oversampling parameter p a default value? Say p = 25. Slightly high but I imagine most use cases are noisy and could benefit from the larger value. I have been testing ssvd and lanczos svd on Amazon EMR. Seeing about a 15x speedup in ssv

Re: [jira] [Commented] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Dmitriy Lyubimov
If you are looking for the areas of a new algorithms missing, I think Ted recently published a list of things sought after. I for myself would very much like to see SVM things done at scale. Another feature request from me is hierarchical ckustering, if you like a new challenge. The challenge for

[jira] [Commented] (MAHOUT-901) KnnItemBasedRecommender is not working properly

2011-11-29 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159266#comment-13159266 ] Hudson commented on MAHOUT-901: --- Integrated in Mahout-Quality #1211 (See [https://builds.ap

[jira] [Resolved] (MAHOUT-868) Rename build*.sh examples to be more indicative of what they actually do, i.e. classify-20newsgroups.sh

2011-11-29 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-868. Resolution: Fixed > Rename build*.sh examples to be more indicative of what they actual

[jira] [Commented] (MAHOUT-344) Minhash based clustering

2011-11-29 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159249#comment-13159249 ] Grant Ingersoll commented on MAHOUT-344: Ankur, do you have a reference for this i

[jira] [Created] (MAHOUT-902) TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap

2011-11-29 Thread Sebastian Schelter (Created) (JIRA)
TanimotoCoefficientSimilarity should return Double.NaN for two items that have zero overlap --- Key: MAHOUT-902 URL: https://issues.apache.org/jira/browse/MAHOUT-902

Re: [jira] [Commented] (MAHOUT-817) Add PCA options to SSVD code

2011-11-29 Thread Grant Ingersoll
On Nov 28, 2011, at 2:25 PM, Raphael Cendrillon wrote: > Hi Ted, > > I think the difficulty I have is in identifying areas to contribute that the > community will find useful. > > If I understand correctly at this stage the major algorithms are in place and > the focus is on polishing the ex

[jira] [Commented] (MAHOUT-901) KnnItemBasedRecommender is not working properly

2011-11-29 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159200#comment-13159200 ] Sean Owen commented on MAHOUT-901: -- Thanks, though this is identical to the current test

[jira] [Issue Comment Edited] (MAHOUT-901) KnnItemBasedRecommender is not working properly

2011-11-29 Thread Georgi Stanev (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159196#comment-13159196 ] Georgi Stanev edited comment on MAHOUT-901 at 11/29/11 11:20 AM: ---

[jira] [Updated] (MAHOUT-901) KnnItemBasedRecommender is not working properly

2011-11-29 Thread Georgi Stanev (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Georgi Stanev updated MAHOUT-901: - Attachment: KnnItemBasedRecommende_newTest.java A JUnit test based on the original JUnit test. Un

[jira] [Resolved] (MAHOUT-901) KnnItemBasedRecommender is not working properly

2011-11-29 Thread Sean Owen (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-901. -- Resolution: Fixed Thanks, I've committed, with minor style and formatting changes. This does seem to w

[jira] [Commented] (MAHOUT-901) KnnItemBasedRecommender is not working properly

2011-11-29 Thread Georgi Stanev (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159161#comment-13159161 ] Georgi Stanev commented on MAHOUT-901: -- About the exception line, a user is removed i

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

2011-11-29 Thread jirapos...@reviews.apache.org (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159155#comment-13159155 ] jirapos...@reviews.apache.org commented on MAHOUT-880: --