[jira] [Commented] (MAHOUT-1181) Adding StreamingKMeans MapReduce classes

2013-03-29 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617272#comment-13617272 ] Sebastian Schelter commented on MAHOUT-1181: Dan, the code looks very

[jira] [Commented] (MAHOUT-1161) Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception.

2013-03-28 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616390#comment-13616390 ] Sebastian Schelter commented on MAHOUT-1161: I modified the code to al

[jira] [Updated] (MAHOUT-1161) Unable to run CJKAnalyzer for conversion of a sequence file to sparse vector due to instantiation exception.

2013-03-28 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1161: --- Attachment: MAHOUT-1161.patch Ajit, this patch in trunk should fix the issue, can

Re: svn commit: r1461422 - /mahout/trunk/CHANGELOG

2013-03-28 Thread Sebastian Schelter
Hm, we could also have a look at this, but I thought a simple text file might be the most easy to use solution. On 28.03.2013 02:26, Benson Margulies wrote: > Why don't we just use the maven changes plugin? > > > On Wed, Mar 27, 2013 at 2:11 AM, wrote: > >> Author: ssc >> Date: Wed Mar 27 06:1

Re: Mahout Suggestions - Refactoring Effort

2013-03-26 Thread Sebastian Schelter
Totally agree on that. The impact of making Mahout more usable is much higher than that of adding a new algorithm. On 27.03.2013 05:41, Ted Dunning wrote: > It is critically important. > > On Wed, Mar 27, 2013 at 2:14 AM, Marty Kube < > martyk...@beavercreekconsulting.com> wrote: > >> IMHO usabi

[jira] [Resolved] (MAHOUT-1176) Introduce an Changelog file to raise contributors attribution

2013-03-26 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1176. Resolution: Fixed Fix Version/s: 0.8 Assignee: Sebastian Schelter

[jira] [Created] (MAHOUT-1176) Introduce an Changelog file to raise contributors attribution

2013-03-26 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1176: -- Summary: Introduce an Changelog file to raise contributors attribution Key: MAHOUT-1176 URL: https://issues.apache.org/jira/browse/MAHOUT-1176 Project

Introduction of a changelog file to raise attribution

2013-03-26 Thread Sebastian Schelter
In response to the current discussion about raising attribution for contributors, I suggest we introduce a CHANGELOG file similar to the one used in Giraph [1]. For every commit, we add a single line with the id and name of the jira, the name of the committer and potentially the name of the contrib

Re: Mahout Suggestions - Refactoring Effort

2013-03-26 Thread Sebastian Schelter
Hi Gokhan, I like the idea, but I'm not sure whether its completely feasible for all parts of Mahout. A lot of jobs need a little more than a matrix, for example an additional dictionary for text-based stuff In the collaborative filtering code, we already have a common input format: All recommend

[jira] [Resolved] (MAHOUT-1173) Reactivate checkstyle

2013-03-26 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1173. Resolution: Fixed Reworked the sources and committed the changes, only 3

Re: Call to action – Mahout needs your help

2013-03-26 Thread Sebastian Schelter
Hi Mike, > Regarding attribution, I saw it mentioned elsewhere in this thread and I > noticed it myself so I thought I'd throw in my 2 cents. While it seems like > a small thing, I wonder whether instituting the Hadoopish "Contributed by > so-and-so" in commit messages to assign credit for patches

[jira] [Commented] (MAHOUT-1173) Reactivate checkstyle

2013-03-25 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613535#comment-13613535 ] Sebastian Schelter commented on MAHOUT-1173: Agreed. I would commit

[jira] [Updated] (MAHOUT-1173) Reactivate checkstyle

2013-03-25 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1173: --- Attachment: mahout-checkstyle.xml I recreated the checkstyle file. I started with

[jira] [Created] (MAHOUT-1173) Reactivate checkstyle

2013-03-25 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1173: -- Summary: Reactivate checkstyle Key: MAHOUT-1173 URL: https://issues.apache.org/jira/browse/MAHOUT-1173 Project: Mahout Issue Type: Improvement

Re: changes without JIRA's

2013-03-25 Thread Sebastian Schelter
I guess this refers to the cleanups I've done in the last days. In the future, I will create a Jira for each and attach a patch. On 25.03.2013 16:31, Ted Dunning wrote: > I would like it if all changes to the code be accompanied by a JIRA that > describes the problem being solved and that the comm

[jira] [Updated] (MAHOUT-1172) Replace org.apache.mahout.cf.taste.common.TopK with Lucene's PriorityQueue

2013-03-25 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1172: --- Issue Type: Improvement (was: Bug) > Repl

[jira] [Updated] (MAHOUT-1172) Replace org.apache.mahout.cf.taste.common.TopK with Lucene's PriorityQueue

2013-03-25 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1172: --- Fix Version/s: 0.8 > Replace org.apache.mahout.cf.taste.common.TopK w

[jira] [Resolved] (MAHOUT-1172) Replace org.apache.mahout.cf.taste.common.TopK with Lucene's PriorityQueue

2013-03-25 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1172. Resolution: Fixed > Replace org.apache.mahout.cf.taste.common.TopK w

[jira] [Commented] (MAHOUT-1025) Update documentation for LDA before the release.

2013-03-25 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612455#comment-13612455 ] Sebastian Schelter commented on MAHOUT-1025: The issue here is that h

[jira] [Updated] (MAHOUT-1172) Replace org.apache.mahout.cf.taste.common.TopK with Lucene's PriorityQueue

2013-03-25 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1172: --- Attachment: MAHOUT-1172.patch > Replace org.apache.mahout.cf.taste.common.T

[jira] [Created] (MAHOUT-1172) Replace org.apache.mahout.cf.taste.common.TopK with Lucene's PriorityQueue

2013-03-25 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1172: -- Summary: Replace org.apache.mahout.cf.taste.common.TopK with Lucene's PriorityQueue Key: MAHOUT-1172 URL: https://issues.apache.org/jira/browse/MAHOUT

Re: Call to action – Mahout needs your help

2013-03-25 Thread Sebastian Schelter
Hi, throwing in my 2 cents here: I think that you mentioned a very good point with stating that it is not clear whether Mahout is a library, a standalone program to interact with via the command line. IMO, its first and foremost a library (similar to Lucene), and this should also be reflected in

Re: Call to action – Mahout needs your help

2013-03-24 Thread Sebastian Schelter
Hi Grant, how would/could such a scale back look like? Best, Sebastian On 24.03.2013 18:30, Grant Ingersoll wrote: > Personally, I think the bigger issue is that most of the committers (me > included) are not very active, so we either need to identify other committers > sooner rather than lat

Checkstyle

2013-03-24 Thread Sebastian Schelter
Why is checkstyle removed from our pom? Is there a particular reason for that? I would suggest to reintroduce it and make the build fail on violations to increase code quality. Best, Sebastian

Re: increase in PMD warnings

2013-03-24 Thread Sebastian Schelter
Guess I'm responsible for those warnings, let me have a look. On 24.03.2013 12:07, Ted Dunning wrote: > Build #1920 [1] showed a sharply increased number of PMD warnings recently. > > > The report that shows the new warnings [2] indicates that the new warnings > seem to be primarily unused impor

[jira] [Resolved] (MAHOUT-1169) Multithreaded recommendation computation from a factorization

2013-03-21 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1169. Resolution: Fixed > Multithreaded recommendation computation fro

[jira] [Updated] (MAHOUT-1169) Multithreaded recommendation computation from a factorization

2013-03-21 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1169: --- Fix Version/s: 0.8 > Multithreaded recommendation computation fro

Re: https://issues.apache.org/jira/browse/MAHOUT-1168

2013-03-21 Thread Sebastian Schelter
Hi Benson, I concur with what you say, CTR is just fine for these kind of changes. On 21.03.2013 08:39, Benson Margulies wrote: > So, I'm going to adopt the following increment on this little effort, and > see if anyone objects. > > My view is that CTR is plenty good for plumbing like this; it i

[jira] [Updated] (MAHOUT-1169) Multithreaded recommendation computation from a factorization

2013-03-20 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1169: --- Attachment: MAHOUT-1169.patch > Multithreaded recommendation computation fro

[jira] [Created] (MAHOUT-1169) Multithreaded recommendation computation from a factorization

2013-03-20 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1169: -- Summary: Multithreaded recommendation computation from a factorization Key: MAHOUT-1169 URL: https://issues.apache.org/jira/browse/MAHOUT-1169 Project

[jira] [Updated] (MAHOUT-1167) Parallel item similarity precomputation on a single machine

2013-03-20 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1167: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Paral

Re: Discussion Of ML environment/MR, Mahout

2013-03-19 Thread Sebastian Schelter
aka regularization rate? > On Mar 19, 2013 8:10 AM, "Sebastian Schelter" wrote: > >> Played a little more with the code, it works astonishingly well. I was >> totally off in my expectations. >> >> I was able to run an iteration of ALS (two map-onl

Re: Discussion Of ML environment/MR, Mahout

2013-03-19 Thread Sebastian Schelter
at 7:41 PM, Sebastian Schelter wrote: > >> Hadoop has to reschedule every iteration as separate job, reread the >> input data from disk and write the iterations result to HDFS. In fact an >> ALS iteration always includes twice of these things as it needs two M/R >> jobs

[jira] [Commented] (MAHOUT-1166) Multithreaded version of distributed ALS

2013-03-18 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605114#comment-13605114 ] Sebastian Schelter commented on MAHOUT-1166: It's not dense unfo

[jira] [Updated] (MAHOUT-1167) Parallel item similarity precomputation on a single machine

2013-03-18 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1167: --- Status: Patch Available (was: Open) > Parallel item similarity precomputat

[jira] [Updated] (MAHOUT-1167) Parallel item similarity precomputation on a single machine

2013-03-18 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1167: --- Attachment: MAHOUT-1167.patch Patch for the parallel precomputation. Also

[jira] [Created] (MAHOUT-1167) Parallel item similarity precomputation on a single machine

2013-03-18 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1167: -- Summary: Parallel item similarity precomputation on a single machine Key: MAHOUT-1167 URL: https://issues.apache.org/jira/browse/MAHOUT-1167 Project

[jira] [Resolved] (MAHOUT-1166) Multithreaded version of distributed ALS

2013-03-16 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1166. Resolution: Fixed > Multithreaded version of distributed

[jira] [Work started] (MAHOUT-1166) Multithreaded version of distributed ALS

2013-03-16 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on MAHOUT-1166 started by Sebastian Schelter. > Multithreaded version of distributed ALS > > >

[jira] [Updated] (MAHOUT-1166) Multithreaded version of distributed ALS

2013-03-16 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1166: --- Attachment: MAHOUT-1166.patch > Multithreaded version of distributed

[jira] [Created] (MAHOUT-1166) Multithreaded version of distributed ALS

2013-03-16 Thread Sebastian Schelter (JIRA)
Sebastian Schelter created MAHOUT-1166: -- Summary: Multithreaded version of distributed ALS Key: MAHOUT-1166 URL: https://issues.apache.org/jira/browse/MAHOUT-1166 Project: Mahout Issue

Re: Discussion Of ML environment/MR, Mahout

2013-03-16 Thread Sebastian Schelter
matrix of Netflix in ~40 seconds using 23 machines (Netflix consists of 23 64MB blocks). Indeed it follows that ALS is fast enough on Hadoop. On 14.03.2013 17:02, Sean Owen wrote: > On Wed, Mar 13, 2013 at 7:41 PM, Sebastian Schelter wrote: > >> Hadoop has to reschedule every ite

[jira] [Updated] (MAHOUT-1165) TreeVisualizer does not show info of CategoricalNode correctly

2013-03-16 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1165: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you for

Re: Adding a new command-line program

2013-03-14 Thread Sebastian Schelter
Adding to the .props file should be enough, afaik. You should debug the first lines of MahoutDriver. Best, Sebastian On 14.03.2013 21:39, Ted Dunning wrote: > I was unable to answer this off the cuff in direct email. > > Anybody else remember the answer? > > On Wed, Mar 13, 2013 at 12:44 PM, D

Re: Discussion Of ML environment/MR, Mahout

2013-03-13 Thread Sebastian Schelter
I have run the current ALS code on netflix some month ago, but not done a thorough benchmark. I can do this next week, when I have access to a cluster again and give some numbers, so we can compare them to GraphLab (and to your implementation, if you want). I have done lots of experiments using gra

[jira] [Updated] (MAHOUT-1074) FPGrowthDriver only supports input from local file

2013-03-13 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1074: --- Resolution: Fixed Status: Resolved (was: Patch Available

Re: mahout collections updates

2013-03-12 Thread Sebastian Schelter
s[n] * x.getQuick(n); } return sum; } } On 12.03.2013 23:18, Jake Mannix wrote: > Ok now that's even weirder than I thought! You're still calling vector > methods, you're not doing direct array access or anything... > > On Tuesday, March 12, 2013, Sebastian Schelte

Re: Discossuon Of ML environment/MR, Mahout

2013-03-12 Thread Sebastian Schelter
s that their programming and execution model is a much better fit for the problem. > > I'm still skeptical that it's unsuitable for many things, even if you can > surely imagine a better ideal framework for any given problem. > > > > On Mon, Mar 11, 2013 at

Re: mahout collections updates

2013-03-12 Thread Sebastian Schelter
ually unsurprisingly fast. > > > On Tue, Mar 12, 2013 at 9:56 PM, Jake Mannix wrote: > >> But then where does it slow down? It just wraps a double[] >> >> On Tuesday, March 12, 2013, Sebastian Schelter wrote: >> >>> I looked into DenseVector and it doesn

[jira] [Resolved] (MAHOUT-1064) Weird behavior of vector dumper

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1064. Resolution: Fixed > Weird behavior of vector dum

[jira] [Resolved] (MAHOUT-1082) driver seqdirectory fails with param -filter set

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1082. Resolution: Fixed > driver seqdirectory fails with param -filter

[jira] [Updated] (MAHOUT-1140) Uniform random sampling problem in RandomSeedGenerator.java

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1140: --- Resolution: Duplicate Status: Resolved (was: Patch Available) this has

[jira] [Resolved] (MAHOUT-1123) Support Lucene 3.6 analyzers for vectorization

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1123. Resolution: Won't Fix Mahout's Lucene version has been updated to 4

Re: mahout collections updates

2013-03-12 Thread Sebastian Schelter
I looked into DenseVector and it doesn't use any primitive collections, so ignore my last mail :) On 12.03.2013 22:16, Sebastian Schelter wrote: > As a sidenote: I was kinda shocked recently, that switching from > DenseVector's dot() method to a direct dot product computation gave

Re: mahout collections updates

2013-03-12 Thread Sebastian Schelter
As a sidenote: I was kinda shocked recently, that switching from DenseVector's dot() method to a direct dot product computation gave a 3x increase in performance in org.apache.mahout.cf.taste.hadoop.als.RecommenderJob. It seems like we really have a performance problem for some usecases. On 12.03

[jira] [Resolved] (MAHOUT-1141) Driver for cvb0_local does not warn about missing maxIterations command line parameter

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1141. Resolution: Fixed Problem was a failing cast, switched to parsing the args from

[jira] [Resolved] (MAHOUT-934) Deploy sgd classifier trained model in an application

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-934. --- Resolution: Invalid Please ask such questions on the mailing list

[jira] [Commented] (MAHOUT-998) Fix up the cluster-reuters.sh script

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600375#comment-13600375 ] Sebastian Schelter commented on MAHOUT-998: --- any details available on this

[jira] [Commented] (MAHOUT-668) Adding knn support to Mahout classifiers

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600362#comment-13600362 ] Sebastian Schelter commented on MAHOUT-668: --- Moving this to the bac

[jira] [Updated] (MAHOUT-668) Adding knn support to Mahout classifiers

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-668: -- Fix Version/s: Backlog > Adding knn support to Mahout classifi

[jira] [Updated] (MAHOUT-1075) ClusterDumper output file should be optional

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1075: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you for

[jira] [Resolved] (MAHOUT-1131) Can't execute alternative FPG implementation from command line

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1131. Resolution: Fixed Thank you for your contribution! > Ca

Re: Discossuon Of ML environment/MR, Mahout

2013-03-12 Thread Sebastian Schelter
Do you mean separate project as sub-project of Mahout or as an incubating Apache project or external project? On 12.03.2013 11:25, Sean Owen wrote: > YARN + Spark sounds interesting indeed. Could I suggest this sounds like > definitely a separate project vs yet another experiment in this code bas

Re: Review Request: Basic Iterable for OpenKeyTypeValueTypeHashMap

2013-03-12 Thread Sebastian Schelter
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9867/#review17717 --- Ship it! Ship It! - Sebastian Schelter On March 12, 2013, 4:40

[jira] [Resolved] (MAHOUT-1130) Wrong logic in org.apache.mahout.clustering.kmeans.RandomSeedGenerator

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1130. Resolution: Fixed Assignee: Sebastian Schelter > Wrong logic

Re: Discossuon Of ML environment/MR, Mahout

2013-03-12 Thread Sebastian Schelter
experiment that will will > cut away if it leads to a mess. > > On Mon, Mar 11, 2013 at 2:39 PM, Sebastian Schelter wrote: > >> That's a tough question. I'd say we should only consider a) or c) as I >> makes no sense to depend on some research prototype system that

[jira] [Updated] (MAHOUT-1104) Improve Javadoc for AbstractVectorClassifier

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1104: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you very

[jira] [Resolved] (MAHOUT-1081) Precision of float to 3 digits after decimal - loss of information

2013-03-12 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1081. Resolution: Invalid There is no loss of precision. You are only talking about the

[jira] [Updated] (MAHOUT-1130) Wrong logic in org.apache.mahout.clustering.kmeans.RandomSeedGenerator

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1130: --- Attachment: MAHOUT-1130.patch Patch that makes the RandomSeedGenerator use

Re: Discossuon Of ML environment/MR, Mahout

2013-03-11 Thread Sebastian Schelter
feel a) is not really > an option and in a way is equivalent to c) since it involves unspecified > amount of waiting for unspecified things. > > > > On Mon, Mar 11, 2013 at 1:54 PM, Sebastian Schelter wrote: > >> I spent the last months working on the Stratosphere sy

Re: Discossuon Of ML environment/MR, Mahout

2013-03-11 Thread Sebastian Schelter
43, Dmitriy Lyubimov wrote: > On Mon, Mar 11, 2013 at 1:24 PM, Sebastian Schelter wrote: > >> Ideally, as implementor of a machine learning library wouldn't want to >> think about how to most efficiently execute joins. It's data dependent >> anyway in most cas

Re: Discossuon Of ML environment/MR, Mahout

2013-03-11 Thread Sebastian Schelter
The GraphLab guys benchmark their ALS implementation against an old version of ours and in detail describe why they can achieve a 40x to 60x performance improvement. Most of the overhead is attributed to Hadoop and its programming model. Its on the left column of Page 724 in http://vldb.org/pvldb/

Re: Discossuon Of ML environment/MR, Mahout

2013-03-11 Thread Sebastian Schelter
> Anyway, what i am > saying, isn't it more or less truthful to say that in pragmatic ways ALS > stuff in Mahout is lagging for the very reason of Mahout being constrained > to MR? Definitely.

Re: Discossuon Of ML environment/MR, Mahout

2013-03-11 Thread Sebastian Schelter
Ideally, as implementor of a machine learning library wouldn't want to think about how to most efficiently execute joins. It's data dependent anyway in most cases. You would want to have an optimizer similar to the ones used in databases that takes your map reduce data flow and figures out the best

[jira] [Commented] (MAHOUT-1130) Wrong logic in org.apache.mahout.clustering.kmeans.RandomSeedGenerator

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599193#comment-13599193 ] Sebastian Schelter commented on MAHOUT-1130: This thing is really, re

[jira] [Updated] (MAHOUT-1041) Support for PMML

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1041: --- Affects Version/s: (was: Backlog) > Support for P

[jira] [Updated] (MAHOUT-1041) Support for PMML

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1041: --- Fix Version/s: Backlog > Support for P

[jira] [Updated] (MAHOUT-1041) Support for PMML

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1041: --- Affects Version/s: (was: 1.0) Backlog > Support

[jira] [Resolved] (MAHOUT-1085) mahout-kmeans, the chance to pick new element

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1085. Resolution: Duplicate work on this has already been started in MAHOUT-1130

[jira] [Resolved] (MAHOUT-1150) ARFF Integration does not support quoted identifiers

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1150. Resolution: Fixed thank you for the contribution > A

[jira] [Commented] (MAHOUT-1158) Migrate/transform IDs from alphanumeric to Long

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599039#comment-13599039 ] Sebastian Schelter commented on MAHOUT-1158: Hi Adam, I suggest to op

Re: Spectral fixes

2013-03-11 Thread Sebastian Schelter
Hi Shannon, I think most jobs don't delete their temporary files. Having a command-line flag should be fine. On 11.03.2013 18:11, Shannon Quinn wrote: > I have a load of fixes in the pipeline for the spectral clustering > algorithms. The work on Eigencuts is extensive and still ongoing, so > whi

[jira] [Closed] (MAHOUT-1144) Wrong normalization in SVD++

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter closed MAHOUT-1144. -- Assignee: Sebastian Schelter (was: Sean Owen) > Wrong normalization in

[jira] [Commented] (MAHOUT-1144) Wrong normalization in SVD++

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598992#comment-13598992 ] Sebastian Schelter commented on MAHOUT-1144: It's a typo, Zeno&

[jira] [Updated] (MAHOUT-1093) CrossFoldLearner trains in all folds if trackign key is negative

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1093: --- Resolution: Fixed Assignee: Sebastian Schelter Status: Resolved (was

[jira] [Resolved] (MAHOUT-1066) How to generate sparsed Vectors from the specified dictionary.

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1066. Resolution: Invalid This should be asked as a question on the mailinglist

[jira] [Updated] (MAHOUT-1069) Multi-target, side-info aware, SGD-based recommender algorithms, examples, and tools to run

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1069: --- Affects Version/s: (was: 0.8) Backlog > Multi-tar

[jira] [Resolved] (MAHOUT-1119) code bug in org.apache.mahout.text.SequenceFilesFromDirectory

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1119. Resolution: Fixed Assignee: Sebastian Schelter thank you for pointing us to

[jira] [Resolved] (MAHOUT-1095) using mahout(0.7) model

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1095. Resolution: Cannot Reproduce > using mahout(0.7) mo

[jira] [Resolved] (MAHOUT-1000) Implementation of Single Sample T-Test using Map Reduce/Mahout

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1000. Resolution: Fixed Closing this as it hasn't been picked up for several m

[jira] [Updated] (MAHOUT-1031) Drop empty vectors in encoding pipeline

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1031: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Drop em

[jira] [Commented] (MAHOUT-1098) ColumnMeansJob broken

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598883#comment-13598883 ] Sebastian Schelter commented on MAHOUT-1098: Why is this issue still

[jira] [Updated] (MAHOUT-1102) Mahout build fails for default profile if hadoop.version is passed as argument

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1102: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) M

[jira] [Updated] (MAHOUT-1018) LDADriver fails with -overwrite option under Hadoop 0.23

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1018: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) M

[jira] [Updated] (MAHOUT-1111) Logging bindings not working in current trunk as of github 2012-November-9 18:41

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-: --- Resolution: Fixed Assignee: Sebastian Schelter Status: Resolved (was

[jira] [Resolved] (MAHOUT-1061) mapreduce split causes ClassNotFound exception

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1061. Resolution: Fixed Assignee: Sebastian Schelter thank you very much for your

[jira] [Updated] (MAHOUT-1076) Matrix Multiplication output to user specified directory

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1076: --- Resolution: Fixed Status: Resolved (was: Patch Available) thank you for

[jira] [Resolved] (MAHOUT-1146) Cardinality exception bug in 'cross' method of AbstractVector class.

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter resolved MAHOUT-1146. Resolution: Fixed Assignee: Sebastian Schelter thank you very much for

[jira] [Updated] (MAHOUT-1019) VectorDistanceSimilarityJob

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1019: --- Resolution: Fixed Status: Resolved (was: Patch Available

[jira] [Updated] (MAHOUT-1157) AbstractCluster.formatVector iteration bug.

2013-03-11 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1157: --- Resolution: Fixed Status: Resolved (was: Patch Available

<    5   6   7   8   9   10   11   12   13   14   >