[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680977#comment-13680977 ] Ted Dunning commented on MAHOUT-975: The ability to pass in a collection of good label

[jira] [Commented] (MAHOUT-1175) IllegalStateException and FileNotFoundException occures when running mahout inbuilt mapreduce implementation of frequent pattern mining.

2013-06-11 Thread Paul R. Brown (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680937#comment-13680937 ] Paul R. Brown commented on MAHOUT-1175: --- FWIW, I'm experiencing the same issues (Fi

In-Mapper combiner design pattern

2013-06-11 Thread DB Tsai
Hi, Recently we started to use the in-mapper combiner design patterns in our hadoop based algorithms at Alpine Data Labs; those algorithms include variable selection using info gain, decision tree, naive bayes model and SVM, and we found that we can have 20~40% performance speedup without doing to

PCA in mahout

2013-06-11 Thread DB Tsai
Hi folks, I'm trying to use mahout's PCA implementation based on SSVD in our application. I understand that in order to avoid densifying the sparse input, mahout provides an option that the mean of cols can be a parameters to pass into the algorithms. However, a lot of time, the scale of each axis

Build failed in Jenkins: mahout-nightly #1259

2013-06-11 Thread Apache Jenkins Server
See Changes: [jmannix] Fixes MAHOUT-1147. Just had to set the MODEL_PATHS on the doc-topic inference job -- [...truncated 4023 lines...] Downloading: http://repo.maven.apache.org/maven2/org/apa

Build failed in Jenkins: mahout-nightly » Mahout Integration #1259

2013-06-11 Thread Apache Jenkins Server
See -- [INFO] [INFO]

Jenkins build is back to normal : Mahout-Quality #2074

2013-06-11 Thread Apache Jenkins Server
See

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Yexi Jiang (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680673#comment-13680673 ] Yexi Jiang commented on MAHOUT-975: --- The size of goodLabels in updateRanking is always 1

[jira] [Commented] (MAHOUT-1253) Add experiment tools for StreamingKMeans

2013-06-11 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680651#comment-13680651 ] Robin Anil commented on MAHOUT-1253: Please also add it to the the examples/bin/clust

Re: 0.8 progress

2013-06-11 Thread Robin Anil
abt review: you can send it my way Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. On Tue, Jun 11, 2013 at 3:36 PM, Dan Filimon wrote: > Sorry to rain on everyone's party, but I opened a few more issues I need to > take of before 0.8 final that I had forgotten about. > M-1253 to

Re: 0.8 progress

2013-06-11 Thread Dan Filimon
Sorry to rain on everyone's party, but I opened a few more issues I need to take of before 0.8 final that I had forgotten about. M-1253 to M-1256. I have code for all of these (that I tested, incidentally, that's the code I used for the experiments in the talk :), just need to merge it in and I wa

[jira] [Commented] (MAHOUT-1255) Change BallKMeans weighting to use log(weight)

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680645#comment-13680645 ] Ted Dunning commented on MAHOUT-1255: - I know how the testing inspired this change, b

[jira] [Created] (MAHOUT-1256) Improve the CSV handling code to get vectors

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1256: --- Summary: Improve the CSV handling code to get vectors Key: MAHOUT-1256 URL: https://issues.apache.org/jira/browse/MAHOUT-1256 Project: Mahout Issue Type: Impro

[jira] [Created] (MAHOUT-1255) Change BallKMeans weighting to use log(weight)

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1255: --- Summary: Change BallKMeans weighting to use log(weight) Key: MAHOUT-1255 URL: https://issues.apache.org/jira/browse/MAHOUT-1255 Project: Mahout Issue Type: Imp

[jira] [Created] (MAHOUT-1254) Final round of cleanup for StreamingKMeans

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1254: --- Summary: Final round of cleanup for StreamingKMeans Key: MAHOUT-1254 URL: https://issues.apache.org/jira/browse/MAHOUT-1254 Project: Mahout Issue Type: Improve

[jira] [Created] (MAHOUT-1253) Add experiment tools for StreamingKMeans

2013-06-11 Thread Dan Filimon (JIRA)
Dan Filimon created MAHOUT-1253: --- Summary: Add experiment tools for StreamingKMeans Key: MAHOUT-1253 URL: https://issues.apache.org/jira/browse/MAHOUT-1253 Project: Mahout Issue Type: Improveme

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680631#comment-13680631 ] Ted Dunning commented on MAHOUT-975: {quote} 1) The GradientMachine is a special case

Re: Welcome new committers Gokhan Capan and Stevo Slavic

2013-06-11 Thread Dmitriy Lyubimov
congratulations! On Mon, Jun 10, 2013 at 10:22 PM, Dan Filimon wrote: > Congratulations to the both of you! :) > It's great to have you on board! > > > On Tue, Jun 11, 2013 at 3:58 AM, Stevo Slavić wrote: > > > Thanks Grant, Suneel and rest of the team, > > > > I'm a Java software developer and

[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Yexi Jiang (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680476#comment-13680476 ] Yexi Jiang commented on MAHOUT-975: --- There are multiple problems (not only bugs) with th

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-11 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680405#comment-13680405 ] Yiqun Hu commented on MAHOUT-1214: -- According to the 3 feedbacks from Robin, we are impr

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-11 Thread Yiqun Hu (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680407#comment-13680407 ] Yiqun Hu commented on MAHOUT-1214: -- The example has been verified but not as a junit tes

[jira] [Resolved] (MAHOUT-1233) Problem in processing datasets as a single chunk vs many chunks in HADOOP mode in mostly all the clustering algos

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1233. - Resolution: Incomplete Please reopen if you have a repeatable test case, as I am not sur

Re: 0.8 progress

2013-06-11 Thread Suneel Marthi
Grant, M-1030:  This was caused as a result of the refactoring of the clustering code post 0.7 release.  I feel we will be cutting close by rushing this for 0.8,  I suggest that we defer this to backlog (or next release). Suneel From: Grant Ingersoll To: d

Re: 0.8 progress

2013-06-11 Thread Grant Ingersoll
I pushed M-1030 and M-1233. If we can get M-833 and M-1214 in by Thursday, I can roll an RC on Thursday. -Grant On Jun 11, 2013, at 8:56 AM, Grant Ingersoll wrote: > Down to 4 issues! I would say what they are, but JIRA is flaking out again. > > My instinct is that 1030 and 1233 can be push

[jira] [Updated] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1030: Fix Version/s: 0.9 > Regression: Clustered Points Should be WeightedPropertyVectorWrit

[jira] [Commented] (MAHOUT-1214) Improve the accuracy of the Spectral KMeans Method

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680392#comment-13680392 ] Grant Ingersoll commented on MAHOUT-1214: - Any update on this for applying agains

[jira] [Updated] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-06-11 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-1030: Fix Version/s: (was: 0.8) 1.0 I'm going to push this. I know that

Re: 0.8 progress

2013-06-11 Thread Grant Ingersoll
Down to 4 issues! I would say what they are, but JIRA is flaking out again. My instinct is that 1030 and 1233 can be pushed. Suneel has been working hard to get M-833 in. Not sure on M-1214, Robin? -G On Jun 9, 2013, at 6:10 PM, Grant Ingersoll wrote: > > On Jun 9, 2013, at 6:02 PM, Grant

[jira] [Updated] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient

2013-06-11 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-975: --- Fix Version/s: (was: 0.8) Backlog > Bug in Gradient Machine - Computation