[jira] Resolved: (MAHOUT-577) RowSimilarityJob hangs during CooccurrencesMapper

2011-01-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-577. -- Resolution: Not A Problem So the conclusion here is that it's more or less working as intended? Yes I

[jira] Updated: (MAHOUT-603) Standardize implementation of log-likelihood

2011-01-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-603: - Attachment: MAHOUT-603.patch > Standardize implementation of log-likelihood > ---

[jira] Updated: (MAHOUT-603) Standardize implementation of log-likelihood

2011-01-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-603: - Status: Patch Available (was: Open) This patch does the trick, and has some small code formatting / styl

[jira] Created: (MAHOUT-603) Standardize implementation of log-likelihood

2011-01-30 Thread Sean Owen (JIRA)
Standardize implementation of log-likelihood Key: MAHOUT-603 URL: https://issues.apache.org/jira/browse/MAHOUT-603 Project: Mahout Issue Type: Improvement Components: Collaborative Filte

[jira] Commented: (MAHOUT-602) "Partial Implementation" throws exceptions

2011-01-30 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988673#comment-12988673 ] Lance Norskog commented on MAHOUT-602: -- Scenario: I attempted to follow the tutorial

[jira] Updated: (MAHOUT-602) "Partial Implementation" throws exceptions

2011-01-30 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-602: - Attachment: partialImp_fullKDD_errors.log > "Partial Implementation" throws exceptions >

problems with quick start page

2011-01-30 Thread Ted Dunning
I just took a look at our quick start pages and they are woefully old. Anybody want to take a whack at improving them?

[jira] Created: (MAHOUT-602) "Partial Implementation" throws exceptions

2011-01-30 Thread Lance Norskog (JIRA)
"Partial Implementation" throws exceptions -- Key: MAHOUT-602 URL: https://issues.apache.org/jira/browse/MAHOUT-602 Project: Mahout Issue Type: Bug Environment: Macos X java version "1.6.0_22"

Build failed in Hudson: Mahout-Quality #598

2011-01-30 Thread Apache Hudson Server
See Changes: [tdunning] Relaxed test slightly. [tdunning] MAHOUT-600 - Improves floating point comparison in logNormalize test. Also improves javadoc for normalize. -- [...truncated 1543

Re: [jira] Updated: (MAHOUT-593) Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure compatibility with current Mahout dependencies.

2011-01-30 Thread Dmitriy Lyubimov
Ted, > > One way around all of this is for you to post a test case for in-memory SVD > based on the test case from commons-math.  Then all of us can together tweak > mahout-math eigen and svd decompositions to match the desired behavior. > > That gets rid of the entire dependency. * there's alrea

Build failed in Hudson: Mahout-Quality #597

2011-01-30 Thread Apache Hudson Server
See -- [...truncated 1554 lines...] A examples/src/main/java/org/apache/mahout/classifier/sgd/SimpleCsvExamples.java A examples/src/main/java/org/apache/mahout/classifier/sgd/PrintR

Re: Question about log-likelihood formulation

2011-01-30 Thread Sean Owen
Great, that's the best solution -- standardize on what you have in LogLikelihood. The other implementations should just use it. I have enough of a whiff of the intuition here to understand the other formulation and why it's a little more straightforward. I'll file a JIRA with patch and submit. On

Re: Question about log-likelihood formulation

2011-01-30 Thread Ted Dunning
This is close. The formulation here is definitely based on a classical frequentist analysis. But the analysis is of two multinomial distributions (binomial in this case). Both distributions describe people liking or not liking item A. One distribution is in the case of people who like B and the

Re: Question about log-likelihood formulation

2011-01-30 Thread Ted Dunning
Taking the easy question first: On Sun, Jan 30, 2011 at 12:12 PM, Sean Owen wrote: > > PS why would it be desirable to map log(0) to 0? the limit is negative > infinity. The limit of interest is x log x as x approaches 0. But this limit isn't quite direct because x approaches 0 and log x beco

Re: Question about log-likelihood formulation

2011-01-30 Thread Sean Owen
This is an interesting topic. The code I quoted was just what's in the code base now. I thought it had made sense to me -- maybe it's working differently but in a valid way? I reverse-engineered the logic, I think, from the code. In looking at item-item similarity, we're comparing the likelihood

Re: [jira] Updated: (MAHOUT-593) Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure compatibility with current Mahout dependencies.

2011-01-30 Thread Ted Dunning
On Sat, Jan 29, 2011 at 11:01 PM, Dmitriy Lyubimov (JIRA) wrote: > * apache commons dependencies is a mess. math module depends on 2.1 but > core module depends on 1.2 so when run, there are all sorts of linkage > errors because of classes being ocasionally picked up from either 1.2 or 2.1 > . I s

[jira] Commented: (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-01-30 Thread Timothy Potter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988611#action_12988611 ] Timothy Potter commented on MAHOUT-588: --- Hi Sean, Will definitely look into updating

[jira] Commented: (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-01-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988608#action_12988608 ] Sean Owen commented on MAHOUT-588: -- (By the way the S3 size limit is now 5TB) I think thi

[jira] Updated: (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-01-30 Thread Timothy Potter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated MAHOUT-588: -- Attachment: distcp_large_to_s3_failed.log seq2sparse_small_failed.log

Build failed in Hudson: Mahout-Quality #596

2011-01-30 Thread Apache Hudson Server
See Changes: [ssc] MAHOUT-601 Syntax error when running build-reuters.sh script -- [...truncated 1539 lines...] A examples/src/main/java/org/apache/mahout/classifier/bayes/SplitBaye

[jira] Updated: (MAHOUT-601) Syntax error when running build-reuters.sh script

2011-01-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-601: - Resolution: Fixed Status: Resolved (was: Patch Available) > Syntax error when running build-reut

[jira] Commented: (MAHOUT-601) Syntax error when running build-reuters.sh script

2011-01-30 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988599#action_12988599 ] Sebastian Schelter commented on MAHOUT-601: --- Patch is commited. Thank you. > Syn

[jira] Commented: (MAHOUT-371) Proposal to implement Distributed SVD++ Recommender using Hadoop

2011-01-30 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988589#action_12988589 ] Sean Owen commented on MAHOUT-371: -- I don't believe the code here was in a finished state,

[jira] Updated: (MAHOUT-601) Syntax error when running build-reuters.sh script

2011-01-30 Thread Frank Scholten (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Scholten updated MAHOUT-601: -- Status: Patch Available (was: Open) Added patch which changes shebang line to use bash. > Syn

[jira] Updated: (MAHOUT-601) Syntax error when running build-reuters.sh script

2011-01-30 Thread Frank Scholten (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Scholten updated MAHOUT-601: -- Attachment: MAHOUT-601.patch > Syntax error when running build-reuters.sh script >

[jira] Created: (MAHOUT-601) Syntax error when running build-reuters.sh script

2011-01-30 Thread Frank Scholten (JIRA)
Syntax error when running build-reuters.sh script - Key: MAHOUT-601 URL: https://issues.apache.org/jira/browse/MAHOUT-601 Project: Mahout Issue Type: Bug Components: Clustering Af

[jira] Commented: (MAHOUT-371) Proposal to implement Distributed SVD++ Recommender using Hadoop

2011-01-30 Thread steve (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988558#action_12988558 ] steve commented on MAHOUT-371: -- I'm trying to execute this on an amazon EC2 cluster, but i rec