Re: co-occurrence paper and code

2014-08-07 Thread Ted Dunning
Speaking from experience, I think that expressing the threshold as a confidence has some attraction, but can be a bit of a difficult interface. For instance, the equivalent of a 5 standard deviation threshold of either 0.999 or 0.001 (or did I get those right? Can you tell?). In either c

[jira] [Commented] (MAHOUT-1601) Add javadoc for the classes - as there is no clue what the class is for .

2014-08-07 Thread Harish Kayarohanam (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090078#comment-14090078 ] Harish Kayarohanam commented on MAHOUT-1601: I am doing analytics for github

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089966#comment-14089966 ] ASF GitHub Bot commented on MAHOUT-1603: Github user pferrel commented on the pul

Re: co-occurrence paper and code

2014-08-07 Thread Dmitriy Lyubimov
if exploration and bootstrap are concerns, in my case its saturation is achieved by a different methodology. I want this threshold to be, (1) of course optional, and (2) be expressed in confidence level, %, just to understand the ballpark in each case. Ok i think i understand the code to convert c

Re: [jira] [Commented] (MAHOUT-1601) Add javadoc for the classes - as there is no clue what the class is for .

2014-08-07 Thread Ted Dunning
Please! On Thu, Aug 7, 2014 at 11:52 AM, Frank Rosner (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089600#comment-14089600 > ] > > Frank Rosner commented on MAHOUT-1601: > -

Re: co-occurrence paper and code

2014-08-07 Thread Ted Dunning
Yes. This is a good thresholding to do. Typically I have done this by simply providing a threshold on the LLR score itself. It is convenient to restate the score itself as the signed square root of the score since that lets you add information about whether the cooccurrence is more or less commo

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089760#comment-14089760 ] ASF GitHub Bot commented on MAHOUT-1603: Github user dlyubimov commented on the p

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089745#comment-14089745 ] ASF GitHub Bot commented on MAHOUT-1603: Github user dlyubimov commented on the p

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089741#comment-14089741 ] ASF GitHub Bot commented on MAHOUT-1603: Github user pferrel commented on the pul

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089737#comment-14089737 ] ASF GitHub Bot commented on MAHOUT-1603: Github user pferrel commented on the pul

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089713#comment-14089713 ] ASF GitHub Bot commented on MAHOUT-1603: Github user dlyubimov commented on the p

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089649#comment-14089649 ] ASF GitHub Bot commented on MAHOUT-1603: Github user dlyubimov commented on the p

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089645#comment-14089645 ] ASF GitHub Bot commented on MAHOUT-1603: Github user dlyubimov commented on the p

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089638#comment-14089638 ] ASF GitHub Bot commented on MAHOUT-1603: Github user pferrel commented on the pul

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089609#comment-14089609 ] ASF GitHub Bot commented on MAHOUT-1603: Github user dlyubimov commented on the p

Re: co-occurrence paper and code

2014-08-07 Thread Dmitriy Lyubimov
On Thu, Aug 7, 2014 at 11:34 AM, Ted Dunning wrote: > > > Can you say a bit more about what you are trying to do? > Thank you. I would like to customize co-oc code not to just yank top N scored co-occurrences, but also make sure that all of them satisfy rejection of coincidence with a given con

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089599#comment-14089599 ] ASF GitHub Bot commented on MAHOUT-1603: Github user pferrel commented on the pul

[jira] [Commented] (MAHOUT-1601) Add javadoc for the classes - as there is no clue what the class is for .

2014-08-07 Thread Frank Rosner (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089600#comment-14089600 ] Frank Rosner commented on MAHOUT-1601: -- Ok. I will try to add some unit tests for th

[jira] [Commented] (MAHOUT-1593) cluster-reuters.sh does not work complaining java.lang.IllegalStateException

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089585#comment-14089585 ] ASF GitHub Bot commented on MAHOUT-1593: Github user FRosner commented on the pul

Re: co-occurrence paper and code

2014-08-07 Thread Ted Dunning
On Wed, Aug 6, 2014 at 5:07 PM, Dmitriy Lyubimov wrote: > On Wed, Aug 6, 2014 at 5:04 PM, Ted Dunning wrote: > > > On Wed, Aug 6, 2014 at 6:01 PM, Dmitriy Lyubimov > > wrote: > > > > > > LLR is a classic test. > > > > > > > > > What i meant here it doesn't produce a p-value. or does it? > > > >

[jira] [Commented] (MAHOUT-1601) Add javadoc for the classes - as there is no clue what the class is for .

2014-08-07 Thread Harish Kayarohanam (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089450#comment-14089450 ] Harish Kayarohanam commented on MAHOUT-1601: I was digging into to find which

Re: co-occurrence paper and code

2014-08-07 Thread Pat Ferrel
Thinking a bit more about the use of LLR only for similarity. Imagine the case where you are doing text analysis and have TF-IDF weights in the input matrix. LLR has one trait that makes me wonder about settling on it alone for general similarity and it’s more an observation since I have no data

[jira] [Commented] (MAHOUT-1601) Add javadoc for the classes - as there is no clue what the class is for .

2014-08-07 Thread Frank Rosner (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089344#comment-14089344 ] Frank Rosner commented on MAHOUT-1601: -- I added some JavaDoc to DummySimilarity. It

[jira] [Commented] (MAHOUT-1601) Add javadoc for the classes - as there is no clue what the class is for .

2014-08-07 Thread Frank Rosner (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089330#comment-14089330 ] Frank Rosner commented on MAHOUT-1601: -- I will add some javadoc, if you are OK with

[jira] [Updated] (MAHOUT-1605) Make VisualizerTest locale independent

2014-08-07 Thread Frank Rosner (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Rosner updated MAHOUT-1605: - Status: Patch Available (was: Open) > Make VisualizerTest locale independent > --

[jira] [Commented] (MAHOUT-1605) Make VisualizerTest locale independent

2014-08-07 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089277#comment-14089277 ] ASF GitHub Bot commented on MAHOUT-1605: GitHub user FRosner opened a pull reques

[jira] [Updated] (MAHOUT-1605) Make VisualizerTest locale independent

2014-08-07 Thread Frank Rosner (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Rosner updated MAHOUT-1605: - Description: h5. Problem When trying to build Mahout on a machine with a locale that uses a dif

[jira] [Created] (MAHOUT-1605) Make VisualizerTest locale independent

2014-08-07 Thread Frank Rosner (JIRA)
Frank Rosner created MAHOUT-1605: Summary: Make VisualizerTest locale independent Key: MAHOUT-1605 URL: https://issues.apache.org/jira/browse/MAHOUT-1605 Project: Mahout Issue Type: Test

[jira] [Commented] (MAHOUT-1600) Algorithms for computing correlation and covariance

2014-08-07 Thread Nagamallikarjuna (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089109#comment-14089109 ] Nagamallikarjuna commented on MAHOUT-1600: -- Hi Ted, I have a plan for adding tw