Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Pat Ferrel
facepalm, missed that. Thanks. On Jun 10, 2014, at 4:29 PM, Ted Dunning (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027208#comment-14027208 ] Ted Dunning commented on MAHOUT-146

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027208#comment-14027208 ] Ted Dunning commented on MAHOUT-1464: - Matrix and Vector already have something that

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027202#comment-14027202 ] Pat Ferrel commented on MAHOUT-1464: OK, good to know. So the fix above for rows is n

[jira] [Comment Edited] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027159#comment-14027159 ] Pat Ferrel edited comment on MAHOUT-1464 at 6/10/14 11:09 PM: -

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027182#comment-14027182 ] Ted Dunning commented on MAHOUT-1464: - I don't think that numNonZero can be trusted h

Re: TreeBasedRecommenders(Deprecated?)

2014-06-10 Thread Pat Ferrel
There are simple ways to do this without maintaining a separate recommender. First you can simply cluster the input matrix of users by items. Then recommend items closest to the centroid of the cluster the user’s couple of items were in. But this seems dubious for several reasons. Better yet (m

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027159#comment-14027159 ] Pat Ferrel commented on MAHOUT-1464: I think the same thing is happening with number

Re: Time series anomaly detection MAHOUT-1423

2014-06-10 Thread Ted Dunning
Have you looked at the code? This might also help: http://info.mapr.com/resources_ebook_anewlook_anomalydetection.html?cid=blog http://berlinbuzzwords.de/session/deep-learning-high-performance-time-series-databases On Tue, Jun 10, 2014 at 2:28 AM, matteo poletti wrote: > Hi everybody, > >

Re: TreeBasedRecommenders(Deprecated?)

2014-06-10 Thread Ted Dunning
Sahil, You say: Also the use of item-based collaborative filtering recommender turns out to be time consuming. In my experience, item-based systems tend to be the fastest ones. Perhaps we mean different things. What I mean is similar to the approach where indicator behaviors are computed and

[jira] [Commented] (MAHOUT-1572) blockify() to detect (naively) the data sparsity in the loaded data

2014-06-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026865#comment-14026865 ] Hudson commented on MAHOUT-1572: SUCCESS: Integrated in Mahout-Quality #2649 (See [https

[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026831#comment-14026831 ] ASF GitHub Bot commented on MAHOUT-1529: Github user dlyubimov commented on the p

[jira] [Commented] (MAHOUT-1571) Functional Views are not serialized as dense/sparse correctly

2014-06-10 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026832#comment-14026832 ] Hudson commented on MAHOUT-1571: SUCCESS: Integrated in Mahout-Quality #2648 (See [https

[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026818#comment-14026818 ] ASF GitHub Bot commented on MAHOUT-1529: Github user avati commented on the pull

[jira] [Updated] (MAHOUT-1572) blockify() to detect (naively) the data sparsity in the loaded data

2014-06-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1572: - Resolution: Fixed Status: Resolved (was: Patch Available) > blockify() to detec

[jira] [Commented] (MAHOUT-1572) blockify() to detect (naively) the data sparsity in the loaded data

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026809#comment-14026809 ] ASF GitHub Bot commented on MAHOUT-1572: Github user asfgit closed the pull reque

[jira] [Updated] (MAHOUT-1571) Functional Views are not serialized as dense/sparse correctly

2014-06-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1571: - Resolution: Fixed Status: Resolved (was: Patch Available) > Functional Views ar

[jira] [Commented] (MAHOUT-1571) Functional Views are not serialized as dense/sparse correctly

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026766#comment-14026766 ] ASF GitHub Bot commented on MAHOUT-1571: Github user asfgit closed the pull reque

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026742#comment-14026742 ] ASF GitHub Bot commented on MAHOUT-1464: Github user dlyubimov commented on the p

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026738#comment-14026738 ] ASF GitHub Bot commented on MAHOUT-1464: Github user sscdotopen commented on the

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026719#comment-14026719 ] ASF GitHub Bot commented on MAHOUT-1464: Github user dlyubimov closed the pull re

Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Sebastian Schelter
Hi Pat, We truncate the indicators to the top-k and you don't want the self-comparison in there. So I don't see a reason to not exclude it as early as possible. --sebatian On 06/10/2014 05:28 PM, Pat Ferrel wrote: Still getting the wrong values with non-boolean input so I’ll continue to loo

Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Pat Ferrel
Still getting the wrong values with non-boolean input so I’ll continue to look at. Another question is: computeIndicators seems to exclude self-comparison during A’A and, of course, not for B’A. Since this returns the indicator matrix for the general case shouldn’t it include those values? Seem

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026549#comment-14026549 ] ASF GitHub Bot commented on MAHOUT-1464: Github user pferrel commented on the pul

Re: [mahout] MAHOUT-1464 Cooccurrence Analysis on Spark (#8)

2014-06-10 Thread Pat Ferrel
Go ahead and hit the butto. Still have a bit more to do here. On Jun 9, 2014, at 6:47 PM, Dmitriy Lyubimov wrote: you can close -- but since i originated the PR, it is easier for me (I have access to the "close" button on it while everyone else would have to use "close apache/mahout#8" commit

Re: TreeBasedRecommenders(Deprecated?)

2014-06-10 Thread Sahil Sharma
Hi, One place where tree based recommenders(that is using hierarchical clustering) might be useful is a cold start problem. That is suppose a user has only bought a few items ( say 2 or 3) It's kind of hard to capture that user's interests using a user-based collaborative filtering recommender.

Re: TreeBasedRecommenders(Deprecated?)

2014-06-10 Thread Sebastian Schelter
Hi Sahil, don't worry, you're not breaking any rules. We removed the tree-based recommenders because we have never heard of anyone using them over the years. --sebastian On 06/10/2014 09:01 AM, Sahil Sharma wrote: Hi, Firstly I apologize if I'm breaking certain rules by mailing this way, I

Time series anomaly detection MAHOUT-1423

2014-06-10 Thread matteo poletti
Hi everybody, We are three students at TU Berlin currently enrolled in a class given by Sebastian Schelter on scalable data processing. In the next weeks we'll work on a project related to Mahout. We would like to work on time series anomaly detection referring to this issue: https://issues.ap

Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-10 Thread Sebastian Schelter
Oh good catch! I had an extra binarize method before, so that the data was already binary. I merged that into the downsample code and must have overlooked that thing. You are right, numNonZeros is the way to go! On 06/10/2014 01:11 AM, Ted Dunning wrote: Sounds like a very plausible root caus

TreeBasedRecommenders(Deprecated?)

2014-06-10 Thread Sahil Sharma
Hi, Firstly I apologize if I'm breaking certain rules by mailing this way, I'm new to this and would appreciate any help I could get. I was just playing around with the tree-based Recommender ( which seems to be deprecated in the current version "for the lack of use" ) . Why was it deprecated?