Build failed in Jenkins: Mahout-Quality #2073

2013-06-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2073/changes Changes: [jmannix] Fixes MAHOUT-1147. Just had to set the MODEL_PATHS on the doc-topic inference job -- [...truncated 4965 lines...] Running

[jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680237#comment-13680237 ] Hudson commented on MAHOUT-1147: Integrated in Mahout-Quality #2073 (See

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680238#comment-13680238 ] Ted Dunning commented on MAHOUT-975: If you see a significant, low-effort improvement,

[jira] [Updated] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-975: --- Fix Version/s: (was: 0.8) Backlog Bug in Gradient Machine - Computation

[jira] [Updated] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1030: Fix Version/s: (was: 0.8) 1.0 I'm going to push this. I know that

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680392#comment-13680392 ] Grant Ingersoll commented on MAHOUT-1214: - Any update on this for applying

Re: 0.8 progress

2013-06-11 Thread Grant Ingersoll
I pushed M-1030 and M-1233. If we can get M-833 and M-1214 in by Thursday, I can roll an RC on Thursday. -Grant On Jun 11, 2013, at 8:56 AM, Grant Ingersoll gsing...@apache.org wrote: Down to 4 issues! I would say what they are, but JIRA is flaking out again. My instinct is that 1030 and

[jira] [Updated] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1030: Fix Version/s: 0.9 Regression: Clustered Points Should be

Re: 0.8 progress

2013-06-11 Thread Suneel Marthi
Grant, M-1030:  This was caused as a result of the refactoring of the clustering code post 0.7 release.  I feel we will be cutting close by rushing this for 0.8,  I suggest that we defer this to backlog (or next release). Suneel From: Grant Ingersoll

[jira] [Resolved] (MAHOUT-1233) Problem in processing datasets as a single chunk vs many chunks in HADOOP mode in mostly all the clustering algos

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1233. - Resolution: Incomplete Please reopen if you have a repeatable test case, as I am not

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-11 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680407#comment-13680407 ] Yiqun Hu commented on MAHOUT-1214: -- The example has been verified but not as a junit

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-11 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680405#comment-13680405 ] Yiqun Hu commented on MAHOUT-1214: -- According to the 3 feedbacks from Robin, we are

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Yexi Jiang (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680476#comment-13680476 ] Yexi Jiang commented on MAHOUT-975: --- There are multiple problems (not only bugs) with

Re: Welcome new committers Gokhan Capan and Stevo Slavic

2013-06-11 Thread Dmitriy Lyubimov
congratulations! On Mon, Jun 10, 2013 at 10:22 PM, Dan Filimon dangeorge.fili...@gmail.comwrote: Congratulations to the both of you! :) It's great to have you on board! On Tue, Jun 11, 2013 at 3:58 AM, Stevo Slavić ssla...@gmail.com wrote: Thanks Grant, Suneel and rest of the team,

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680631#comment-13680631 ] Ted Dunning commented on MAHOUT-975: {quote} 1) The GradientMachine is a special case

[jira] [Created] (MAHOUT-1253) Add experiment tools for StreamingKMeans

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1253: --- Summary: Add experiment tools for StreamingKMeans Key: MAHOUT-1253 URL: https://issues.apache.org/jira/browse/MAHOUT-1253 Project: Mahout Issue Type:

[jira] [Created] (MAHOUT-1254) Final round of cleanup for StreamingKMeans

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1254: --- Summary: Final round of cleanup for StreamingKMeans Key: MAHOUT-1254 URL: https://issues.apache.org/jira/browse/MAHOUT-1254 Project: Mahout Issue Type:

[jira] [Created] (MAHOUT-1255) Change BallKMeans weighting to use log(weight)

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1255: --- Summary: Change BallKMeans weighting to use log(weight) Key: MAHOUT-1255 URL: https://issues.apache.org/jira/browse/MAHOUT-1255 Project: Mahout Issue Type:

[jira] [Created] (MAHOUT-1256) Improve the CSV handling code to get vectors

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1256: --- Summary: Improve the CSV handling code to get vectors Key: MAHOUT-1256 URL: https://issues.apache.org/jira/browse/MAHOUT-1256 Project: Mahout Issue Type:

[jira] [Commented] (MAHOUT-1255) Change BallKMeans weighting to use log(weight)

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680645#comment-13680645 ] Ted Dunning commented on MAHOUT-1255: - I know how the testing inspired this change,

Re: 0.8 progress

2013-06-11 Thread Dan Filimon
Sorry to rain on everyone's party, but I opened a few more issues I need to take of before 0.8 final that I had forgotten about. M-1253 to M-1256. I have code for all of these (that I tested, incidentally, that's the code I used for the experiments in the talk :), just need to merge it in and I

Re: 0.8 progress

2013-06-11 Thread Robin Anil
abt review: you can send it my way Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. On Tue, Jun 11, 2013 at 3:36 PM, Dan Filimon dangeorge.fili...@gmail.comwrote: Sorry to rain on everyone's party, but I opened a few more issues I need to take of before 0.8 final that I had

[jira] [Commented] (MAHOUT-1253) Add experiment tools for StreamingKMeans

2013-06-11 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680651#comment-13680651 ] Robin Anil commented on MAHOUT-1253: Please also add it to the the

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Yexi Jiang (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680673#comment-13680673 ] Yexi Jiang commented on MAHOUT-975: --- The size of goodLabels in updateRanking is always 1

Jenkins build is back to normal : Mahout-Quality #2074

2013-06-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Mahout-Quality/2074/

Build failed in Jenkins: mahout-nightly » Mahout Integration #1259

2013-06-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/mahout-nightly/org.apache.mahout$mahout-integration/1259/ -- [INFO] [INFO]

Build failed in Jenkins: mahout-nightly #1259

2013-06-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/mahout-nightly/1259/changes Changes: [jmannix] Fixes MAHOUT-1147. Just had to set the MODEL_PATHS on the doc-topic inference job -- [...truncated 4023 lines...] Downloading:

PCA in mahout

2013-06-11 Thread DB Tsai
Hi folks, I'm trying to use mahout's PCA implementation based on SSVD in our application. I understand that in order to avoid densifying the sparse input, mahout provides an option that the mean of cols can be a parameters to pass into the algorithms. However, a lot of time, the scale of each

In-Mapper combiner design pattern

2013-06-11 Thread DB Tsai
Hi, Recently we started to use the in-mapper combiner design patterns in our hadoop based algorithms at Alpine Data Labs; those algorithms include variable selection using info gain, decision tree, naive bayes model and SVM, and we found that we can have 20~40% performance speedup without doing

[jira] [Commented] (MAHOUT-1175) IllegalStateException and FileNotFoundException occures when running mahout inbuilt mapreduce implementation of frequent pattern mining.

2013-06-11 Thread Paul R. Brown (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680937#comment-13680937 ] Paul R. Brown commented on MAHOUT-1175: --- FWIW, I'm experiencing the same issues