Re: Time series anomaly detection MAHOUT-1423

2014-06-10 Thread Ted Dunning
Have you looked at the code? This might also help: http://info.mapr.com/resources_ebook_anewlook_anomalydetection.html?cid=blog http://berlinbuzzwords.de/session/deep-learning-high-performance-time-series-databases On Tue, Jun 10, 2014 at 2:28 AM, matteo poletti wrote: > Hi everybody, > >

Re: TreeBasedRecommenders(Deprecated?)

2014-06-10 Thread Ted Dunning
Sahil, You say: Also the use of item-based collaborative filtering recommender turns out to be time consuming. In my experience, item-based systems tend to be the fastest ones. Perhaps we mean different things. What I mean is similar to the approach where indicator behaviors are computed and

Re: Talking about optimization

2014-06-09 Thread Ted Dunning
ing to do e2e local math optimizer > > and how long it will take. maybe, both is possible -- faster local gains > we > > know current algorithms need, and longer term overhaul that included e2e > > in-core plans. > > > > > > On Mon, Jun 9, 2014 at 4:16 PM

Re: Talking about optimization

2014-06-09 Thread Ted Dunning
On Mon, Jun 9, 2014 at 4:12 PM, Dmitriy Lyubimov wrote: > One open question i had was,well, since the outlined approach avoids > logical plans still for in-core matrix, it would be difficult to optimize > the expressions in this context (i.e. figure out best-working _output_ > structure). > This

Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-09 Thread Ted Dunning
Sounds like a very plausible root cause. On Mon, Jun 9, 2014 at 4:03 PM, Pat Ferrel (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025893#comment-14025893 > ] > > Pat Ferrel

Talking about optimization

2014-06-09 Thread Ted Dunning
So D's disparagement of my half-assed fix for our egregrious mis-performance has lead me to thinking about what a real optimizer would do. I think we all agree that we are attempting to maximize performance for a flow of operations given pre-existing layouts. Transformations include combining and

[jira] [Resolved] (MAHOUT-1577) FindBugs and PMD settings unrealistic

2014-06-08 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1577. - Resolution: Fixed Made a big step on PMD. Findbugs will be more intransigent because most of

[jira] [Created] (MAHOUT-1577) FindBugs and PMD settings unrealistic

2014-06-08 Thread Ted Dunning (JIRA)
Ted Dunning created MAHOUT-1577: --- Summary: FindBugs and PMD settings unrealistic Key: MAHOUT-1577 URL: https://issues.apache.org/jira/browse/MAHOUT-1577 Project: Mahout Issue Type: Bug

[jira] [Resolved] (MAHOUT-1576) Do a quick style pass to knock down some accumulated warnings

2014-06-08 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1576. - Resolution: Fixed Closing since tests pass. > Do a quick style pass to knock down s

[jira] [Created] (MAHOUT-1576) Do a quick style pass to knock down some accumulated warnings

2014-06-08 Thread Ted Dunning (JIRA)
Ted Dunning created MAHOUT-1576: --- Summary: Do a quick style pass to knock down some accumulated warnings Key: MAHOUT-1576 URL: https://issues.apache.org/jira/browse/MAHOUT-1576 Project: Mahout

Re: [jira] [Commented] (MAHOUT-1575) Conjugate gradient assumes best case scenario for convergence

2014-06-07 Thread Ted Dunning
> > > > Key: MAHOUT-1575 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1575 > > Project: Mahout > > Issue Type: Bug > >Reporter: Ted Dunning > > > > The c

[jira] [Resolved] (MAHOUT-1575) Conjugate gradient assumes best case scenario for convergence

2014-06-07 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1575. - Resolution: Fixed Committed small patch to allow 2 extra passes > Conjugate gradient assu

[jira] [Created] (MAHOUT-1575) Conjugate gradient assumes best case scenario for convergence

2014-06-07 Thread Ted Dunning (JIRA)
Ted Dunning created MAHOUT-1575: --- Summary: Conjugate gradient assumes best case scenario for convergence Key: MAHOUT-1575 URL: https://issues.apache.org/jira/browse/MAHOUT-1575 Project: Mahout

[jira] [Commented] (MAHOUT-1574) SparseRowMatrix needs performance improvement for times()

2014-06-06 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020702#comment-14020702 ] Ted Dunning commented on MAHOUT-1574: - {quote} (1) first, an added test is fai

[jira] [Reopened] (MAHOUT-1574) SparseRowMatrix needs performance improvement for times()

2014-06-06 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning reopened MAHOUT-1574: - Assignee: Ted Dunning Oops. Regressed with new code in terms of matrix size checking

[jira] [Resolved] (MAHOUT-1574) SparseRowMatrix needs performance improvement for times()

2014-06-06 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1574. - Resolution: Fixed Committed trivial fix. > SparseRowMatrix needs performance improvement

[jira] [Resolved] (MAHOUT-1574) SparseRowMatrix needs performance improvement for times()

2014-06-06 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning resolved MAHOUT-1574. - Resolution: Fixed Committed fixes and tests. > SparseRowMatrix needs performance improvem

[jira] [Created] (MAHOUT-1574) SparseRowMatrix needs performance improvement for times()

2014-06-06 Thread Ted Dunning (JIRA)
Ted Dunning created MAHOUT-1574: --- Summary: SparseRowMatrix needs performance improvement for times() Key: MAHOUT-1574 URL: https://issues.apache.org/jira/browse/MAHOUT-1574 Project: Mahout

Re: SparkBindings on a real cluster

2014-06-04 Thread Ted Dunning
Great list of issues. On Wed, Jun 4, 2014 at 12:59 AM, Sebastian Schelter wrote: > Hi, > > I did some experimentation with the spark bindings on a real cluster > yesterday, as I had to run some experiments for a paper (unrelated to > Mahout) that I'm currently writing. The experiment basicall

[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

2014-06-03 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017105#comment-14017105 ] Ted Dunning commented on MAHOUT-1529: - I am not sold but I don't th

[jira] [Commented] (MAHOUT-1567) Add online sparse dictionary learning (dimensionality reduction)

2014-06-02 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015387#comment-14015387 ] Ted Dunning commented on MAHOUT-1567: - A sequential implementation would stil

[jira] [Commented] (MAHOUT-1567) Add online sparse dictionary learning (dimensionality reduction)

2014-06-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015006#comment-14015006 ] Ted Dunning commented on MAHOUT-1567: - Here are three possible pages that might

[jira] [Commented] (MAHOUT-1567) Add online sparse dictionary learning (dimensionality reduction)

2014-05-31 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014851#comment-14014851 ] Ted Dunning commented on MAHOUT-1567: - Mairal's method is one of the

[jira] [Commented] (MAHOUT-1567) Add online sparse dictionary learning

2014-05-31 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014782#comment-14014782 ] Ted Dunning commented on MAHOUT-1567: - Maciej, Can you say a bit more about

Re: Itemsimilairty

2014-05-29 Thread Ted Dunning
’s SSVD > PCA, Transpose, multiply, etc. > > On May 29, 2014, at 2:32 PM, Ted Dunning wrote: > > Pat > > I would like to see the co and cross occurrence code separated out a bit > so that they take drm args. > > Sent from my iPhone > > > On May 29, 2014, a

Re: do we really need scala still

2014-05-29 Thread Ted Dunning
g for—a recommender service. > > Is anyone else interested in CLI, drivers, read/write in the import/export > sense? Or a new architecture for the recommenders? If so, maybe a separate > thread? > > On May 29, 2014, at 7:03 AM, Ted Dunning wrote: > > Andrew, > >

Re: Scala code in Mahout

2014-05-29 Thread Ted Dunning
Yes. On Thu, May 29, 2014 at 7:25 AM, Ray wrote: > Just to verify: Is Mahout now accepting code comprised entirely of Scala? > > Ray >

Re: do we really need scala still

2014-05-29 Thread Ted Dunning
Andrew, Sebastian and I were talking yesterday and guessing that you would be interested in this soon. Glad to know the world is as expected. Yes. This needs to happen at least at a very conceptual level. For instance, for classifiers, I think that we need to have something like: - progres

Re: do we really need scala still

2014-05-28 Thread Ted Dunning
+1 Let's use a successful scala model as a suggestion about where to go. It seems plausible that Java could emulate the building of a lazy DSL logical plan and then poke it in plausible ways with the addition of a wrapper layer. But that only helps if the Scala layer succeeds. On Tue, May 27,

Re: Git Migration

2014-05-26 Thread Ted Dunning
With git you can push or pull to multiple repos. To merge the pull request, you can pull from a specific branch from a specific source repo where the pull request came from. Then you push to git-wip at apache making sure that the Fixes #... message is on the merge. Then magic intervenes and all

Re: do we really need scala still

2014-05-25 Thread Ted Dunning
to adapt to java 8 .** > **So it's better to be prepared for it. Since we can do everything with > java 8 , than why to be dependent on scala.** > **Atleast we should start discussing it now to be well prepared for > future. * > > On 05/25/2014 07:14 PM, Ted Dunning wrote: &g

Re: do we really need scala still

2014-05-25 Thread Ted Dunning
Scala allows lazy evaluation and operator overloading and lambdas which can be serialized. Java 8 provides lambdas that I am not sure if you can serialize. Also, Spark already uses Scala which has been around for a decade. Java8 isn't here yet. On Sun, May 25, 2014 at 12:48 PM, bandi shankar

Re: Hadoop 2 support in a real release?

2014-05-23 Thread Ted Dunning
> On Fri, May 23, 2014 at 4:43 PM, Sebastian Schelter < > > ssc.o...@googlemail.com > > > wrote: > > > > > Big +1 > > > Am 23.05.2014 15:33 schrieb "Ted Dunning" : > > > > > > > What do folks think about spinning out a new ve

Re: Hadoop 2 support in a real release?

2014-05-23 Thread Ted Dunning
May 23, 2014 at 4:43 PM, Sebastian Schelter < > ssc.o...@googlemail.com > > wrote: > > > Big +1 > > Am 23.05.2014 15:33 schrieb "Ted Dunning" : > > > > > What do folks think about spinning out a new version of 0.9 that only > > > changes which

Hadoop 2 support in a real release?

2014-05-23 Thread Ted Dunning
What do folks think about spinning out a new version of 0.9 that only changes which version of Hadoop the build uses? There have been quite a few questions lately on this topic. My suggestion would be that we use minor version numbering to maintain this and the normal 0.9 release simultaneously i

Re: Git Migration

2014-05-22 Thread Ted Dunning
I will check as soon as I can. That won't be soon since I am currently on a plane stuck by weather in the wrong place. No idea when I will able to touch the internet again. Sent from my iPhone > On May 22, 2014, at 13:55, Dmitriy Lyubimov wrote: > > At this point I simply would like everyo

Re: Git Migration

2014-05-22 Thread Ted Dunning
Wow. Quick We also need web site updates quickly. Sent from my iPhone > On May 22, 2014, at 13:14, Dmitriy Lyubimov wrote: > > Hi, > > (1) git migration of the project is now complete. Any volunteers to verify > per INFRA-? If you do, please report back to the issue. > > (2) Anybody

[jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

2014-05-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005279#comment-14005279 ] Ted Dunning commented on MAHOUT-1490: - {quote} > (5) or compress wheneve

[jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

2014-05-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005255#comment-14005255 ] Ted Dunning commented on MAHOUT-1490: - (5) or compress whenever there is dange

Re: consensus statement?

2014-05-21 Thread Ted Dunning
Very good description of benefits. On Wed, May 21, 2014 at 5:26 AM, Gokhan Capan wrote: > I want to express my opinions for the vision, too. I tried to capture those > words from various discussions in the dev-list, and hope that most, of them > support the common sense of excitement the new Ma

[jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

2014-05-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004130#comment-14004130 ] Ted Dunning commented on MAHOUT-1490: - D, Many algorithms can handle del

Re: [jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

2014-05-19 Thread Ted Dunning
On Mon, May 19, 2014 at 11:08 AM, Dmitriy Lyubimov (JIRA) wrote: > [~avati] do you think you could perhaps explain (or reference principled > foundation publication) of the algorithm that is happening here? One of the most commonly effective compression techniques is dictionary + run-length. Fo

Re: consensus statement?

2014-05-18 Thread Ted Dunning
On Sun, May 18, 2014 at 11:33 AM, Sebastian Schelter wrote: > I suggest we start with a specific draft that someone prepares (maybe Ted > as he started the thread) This is a good strategy, and I am happy to start the discussion, but I wonder if it might help build consensus if somebody else sta

Re: consensus statement?

2014-05-18 Thread Ted Dunning
On Sun, May 18, 2014 at 10:44 AM, Pat Ferrel wrote: > I think Ted’s intent was to find a simple consensus statement that > addresses where the project is going in a general way. Indeed. And I would emphasize the word consensus. I am trying to build a bit of consensus here. I really think tha

Re: Weighted average of precission, recall and F1 score

2014-05-18 Thread Ted Dunning
It sounds like a reasonable addition. Can you fine a JIRA? On Sun, May 18, 2014 at 3:19 AM, Karol Grzegorczyk wrote: > Hi, > > To get more statistics on classification results, in our project we > slightly extended ResultAnalyzer and ConfusionMatrix classes, adding > calculation of weighted av

Re: Exploring moving Mahout to git as main repo

2014-05-16 Thread Ted Dunning
infra? I’m super grateful for the guys who work > on all this but is it becoming unmaintainable? With every new thing we ask > for (git + svn integration? plus running git repos) they can probably do it > but it adds more moving parts until some day... > > > On May 16, 2014, at

Re: VOTE: moving commits to git-wp.o.a & github PR features.

2014-05-16 Thread Ted Dunning
+1 This is a good move. Several of the projects I have been involved in (Drill, Spark for instance) use git and the results are very positive. The benefits are slightly technical (git works better for me than SVN), but majorly social (use of git as primary with github integration is a big deal,

Re: Exploring moving Mahout to git as main repo

2014-05-16 Thread Ted Dunning
On Fri, May 9, 2014 at 4:17 PM, Pat Ferrel wrote: > But pull requests would just be for contributors, right? Committers should > be able to push to the master directly, right? > Not necessarily. Pull requests are so lovely nice to work with no matter who you are. I can easily imagine using the

[jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

2014-05-16 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000510#comment-14000510 ] Ted Dunning commented on MAHOUT-1490: - This column orientation sounds like a

Re: Exploring moving Mahout to git as main repo

2014-05-16 Thread Ted Dunning
.. yes. It would be nice to use the pretty green button on the github site, but it isn't much more work to do the merges manual. For merges that are clean, there are 3 or 4 commands required to merge the PR and github provides them. Doing the JIRA is additional effort. > > On May 1

Re: consensus statement?

2014-05-16 Thread Ted Dunning
On Wed, May 7, 2014 at 9:54 AM, Pat Ferrel wrote: > Seems like the vision comes from feature champions. I may not use Mahout > in the same way you do but I rely on your code. Maybe I serve a different > user type than you. I don’t see a problem with that, do you? > Sounds good to me.

consensus statement?

2014-05-06 Thread Ted Dunning
I have been involved in side conversations to try to build a bit of unity among our community and would like to propose this as a statement of what we are doing: Apache Mahout is moving immediately to a faster execution model. The first of these is Spark. Outside contributions are always encourag

[jira] [Commented] (MAHOUT-1547) Implementation of Support Vector Machines (Sequential Minimal Optimazation technique) on Hadoop

2014-05-06 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990453#comment-13990453 ] Ted Dunning commented on MAHOUT-1547: - There are few projects left (if any) w

Re: Where the users are

2014-05-05 Thread Ted Dunning
I subscribe to mahout and other topics on SO and try to redirect people to the mailing lists after giving some level of answer. On Mon, May 5, 2014 at 6:02 AM, Andrew Musselman wrote: > Good point. > > > On May 4, 2014, at 6:25 PM, Pat Ferrel wrote: > > > > Stackoverflow. > > > > Not everyon

Re: Broken build

2014-05-04 Thread Ted Dunning
I am rarely able to diagnose Jenkins problems without tearing the job down and adding bits one at a time. Most of the problems are environmental and thus hard to understand from the point of view of our builds. This may be an exception because of the recent changes, but the comment somebody made

Re: [jira] [Updated] (MAHOUT-1428) Recommending already consumed items

2014-05-04 Thread Ted Dunning
The submit patch link is mis-named. Look under "More". Also, you have to be logged in to JIRA to do this. On Sat, May 3, 2014 at 4:40 PM, dodi hakim wrote: > Sorry, where is it? is it in the submit patch link? it's gone now > > > On 4 May 2014 00:27, Sebastian Schelter wrote: > > > You hav

[jira] [Commented] (MAHOUT-1541) Create CLI Driver for Spark Cooccurrence Analysis

2014-05-03 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988743#comment-13988743 ] Ted Dunning commented on MAHOUT-1541: - {quote} BTW do we really need to sup

[jira] [Commented] (MAHOUT-1529) Finalize abstraction of distributed logical plans from backend operations

2014-05-03 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988621#comment-13988621 ] Ted Dunning commented on MAHOUT-1529: - {quote} BTW I truly envy the Spark pro

[jira] [Commented] (MAHOUT-1532) Add solve() function to the Scala DSL

2014-05-02 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988395#comment-13988395 ] Ted Dunning commented on MAHOUT-1532: - So the matrix has to be square and have a

[jira] [Commented] (MAHOUT-1532) Add solve() function to the Scala DSL

2014-05-02 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987575#comment-13987575 ] Ted Dunning commented on MAHOUT-1532: - Can somebody clarify what kind of solve(

[jira] [Commented] (MAHOUT-1532) Add solve() function to the Scala DSL

2014-05-01 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986480#comment-13986480 ] Ted Dunning commented on MAHOUT-1532: - QR is definitely the way to go where it w

Re: Straw poll re: H2O ?

2014-04-30 Thread Ted Dunning
Wed, Apr 30, 2014 at 4:09 PM, Ted Dunning > wrote: > > > I should add that the way that the compression is done is pretty cool for > > speed. The basic idea is that byte code engineering is used to directly > > inject the decompression and compression code into the user

Re: Straw poll re: H2O ?

2014-04-30 Thread Ted Dunning
Ted Dunning > wrote: > > > Inline > > > > > > On Wed, Apr 30, 2014 at 8:25 PM, Dmitriy Lyubimov > > wrote: > > > > > On Wed, Apr 30, 2014 at 7:06 AM, Ted Dunning > > > wrote: > > > > > > > > > > > My motiv

Re: Straw poll re: H2O ?

2014-04-30 Thread Ted Dunning
include statements that something crushes > something without providing a link to a published analysis of what it is > something that crushes something another and due to what something. > > > On Wed, Apr 30, 2014 at 4:16 PM, Ted Dunning > wrote: > > > It seems to me t

Re: Straw poll re: H2O ?

2014-04-30 Thread Ted Dunning
It seems to me that Sebastian and Ellen have hit on the right tack. Let's get back to work making something cool here. Let's build this community up instead of having endlessly divisive discussions. Let's get back to the Apache emphasis on do-acracy. On Wed, Apr 30, 2014 at 11:36 AM, Ellen Fr

Re: Straw poll re: H2O ?

2014-04-30 Thread Ted Dunning
Inline On Wed, Apr 30, 2014 at 8:25 PM, Dmitriy Lyubimov wrote: > On Wed, Apr 30, 2014 at 7:06 AM, Ted Dunning > wrote: > > > > > My motivation to accept comes from the fact that they have machine > learning > > codes that are as fast as what google has internal

Re: Helping out on spark efforts

2014-04-30 Thread Ted Dunning
On Wed, Apr 30, 2014 at 9:24 PM, Dmitriy Lyubimov wrote: > On Wed, Apr 30, 2014 at 11:42 AM, Dmitriy Lyubimov >wrote: > > > I also would suggest to take some guinea pigs to validate stuff. > > > > E.g. if i may make a suggestion, let's see how we'd do a categorical > > variable vectorization int

Re: Helping out on spark efforts

2014-04-30 Thread Ted Dunning
+1 for foundations first. There are bunches of algorithms just behind that. K-means. SGD+Adagrad regression. Autoencoders. K-sparse encoding. Lots of stuff. On Wed, Apr 30, 2014 at 4:52 PM, Sebastian Schelter wrote: > I think you should concentrate on MAHOUT-1490, that is a highly import

Re: Straw poll re: H2O ?

2014-04-30 Thread Ted Dunning
The motivation to contribute comes from h2o. My motivation to accept comes from the fact that they have machine learning codes that are as fast as what google has internally. They completely crush all of the spark efforts on speed. The sample bit was more to experiment with ways to bring the mah

Re: Future of Frequent Pattern Mining

2014-04-28 Thread Ted Dunning
One thought is to extract the code, publish on github with warnings about no support. Then if there are requests, we can point them to the GH archive and tell them to go for it. On Mon, Apr 28, 2014 at 10:03 AM, Suneel Marthi wrote: > +100 to purging this from the codebase. This stuff uses t

[jira] [Commented] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2014-04-27 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982499#comment-13982499 ] Ted Dunning commented on MAHOUT-1236: - I would close this for now since the m

[jira] [Commented] (MAHOUT-1500) H2O integration

2014-04-27 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982497#comment-13982497 ] Ted Dunning commented on MAHOUT-1500: - [~dlie...@gmail.com]'s comments hav

[jira] [Commented] (MAHOUT-1469) Streaming KMeans fails when executed in MapReduce mode and REDUCE_STREAMING_KMEANS is set to true

2014-04-27 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982470#comment-13982470 ] Ted Dunning commented on MAHOUT-1469: - [~arapmv] There is a known bottle nec

[jira] [Commented] (MAHOUT-1252) Add support for Finite State Transducers (FST) as a DictionaryType.

2014-04-27 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982466#comment-13982466 ] Ted Dunning commented on MAHOUT-1252: - I completely agree with Suneel on

[jira] [Commented] (MAHOUT-1500) H2O integration

2014-04-27 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982250#comment-13982250 ] Ted Dunning commented on MAHOUT-1500: - The h2o integration work has been progres

Still no word on CMS

2014-04-23 Thread Ted Dunning
Just figured I would say this before anybody asked.

Re: [jira] [Commented] (MAHOUT-1468) Creating a new page for StreamingKMeans documentation on mahout website

2014-04-22 Thread Ted Dunning
at 11:36 PM, Sebastian Schelter wrote: > Do we have any information when the CMS is going to work again? > > On 04/23/2014 08:29 AM, Ted Dunning (JIRA) wrote: > >> >> [ https://issues.apache.org/jira/browse/MAHOUT-1468?page= >> com.atlassian.jira.plugin.

[jira] [Commented] (MAHOUT-1468) Creating a new page for StreamingKMeans documentation on mahout website

2014-04-22 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977891#comment-13977891 ] Ted Dunning commented on MAHOUT-1468: - Maxim, Nice work. Good descrip

[jira] [Commented] (MAHOUT-1518) Preprocessing for collaborative filtering with the Scala DSL

2014-04-22 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977656#comment-13977656 ] Ted Dunning commented on MAHOUT-1518: - For what it is worth, if data frames a

Re: [jira] [Commented] (MAHOUT-1518) Preprocessing for collaborative filtering with the Scala DSL

2014-04-22 Thread Ted Dunning
Well if it is easy to convert I suppose it only costs memory. Sent from my iPhone > On Apr 22, 2014, at 11:32, "Dmitriy Lyubimov (JIRA)" wrote: > > >[ > https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommen

[jira] [Commented] (MAHOUT-1518) Preprocessing for collaborative filtering with the Scala DSL

2014-04-22 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976752#comment-13976752 ] Ted Dunning commented on MAHOUT-1518: - [~ssc] I would hope that the data frame

[jira] [Commented] (MAHOUT-1485) Clean up Recommender Overview page

2014-04-22 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976738#comment-13976738 ] Ted Dunning commented on MAHOUT-1485: - [~ssc] Could be. [~yash...@gmail.com]

[jira] [Commented] (MAHOUT-1489) Interactive Scala & Spark Bindings Shell & Script processor

2014-04-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976354#comment-13976354 ] Ted Dunning commented on MAHOUT-1489: - [~dlie...@gmail.com] Why do we even nee

[jira] [Commented] (MAHOUT-1489) Interactive Scala & Spark Bindings Shell & Script processor

2014-04-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976261#comment-13976261 ] Ted Dunning commented on MAHOUT-1489: - For new modules that really can't m

CMS should be back

2014-04-21 Thread Ted Dunning
Although they say "basic service".

[jira] [Commented] (MAHOUT-1520) Fix links in Mahout website documentation

2014-04-21 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975558#comment-13975558 ] Ted Dunning commented on MAHOUT-1520: - The problem is not that the links

[jira] [Commented] (MAHOUT-1485) Clean up Recommender Overview page

2014-04-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975219#comment-13975219 ] Ted Dunning commented on MAHOUT-1485: - Yash, This document is describing the

[jira] [Commented] (MAHOUT-1518) Preprocessing for collaborative filtering with the Scala DSL

2014-04-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975201#comment-13975201 ] Ted Dunning commented on MAHOUT-1518: - [~pfarrell] What do you think of the

[jira] [Commented] (MAHOUT-1518) Preprocessing for collaborative filtering with the Scala DSL

2014-04-20 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975164#comment-13975164 ] Ted Dunning commented on MAHOUT-1518: - I would be dubious of file format

Re: Site publishing

2014-04-20 Thread Ted Dunning
gt; - More > 2. >*#asfinfra IRC Bot* ‏@infrabot <https://twitter.com/infrabot> Apr > 16<https://twitter.com/infrabot/status/456383615065001984> > >Correction: The server that is down is the *buildbot* master, > *buildbot*slaves are offline inc.

Re: Site publishing

2014-04-20 Thread Ted Dunning
rabot <https://twitter.com/infrabot> We currently have an issue with one of our servers that hosts build slaves. Including the CMS. We are investigating. -- On Sun, Apr 20, 2014 at 8:18 AM, Ted Dunning wrote: > > The build-bot machine has had miserable stability lately. &g

Re: Site publishing

2014-04-20 Thread Ted Dunning
The build-bot machine has had miserable stability lately. On Sun, Apr 20, 2014 at 8:14 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Filed https://issues.apache.org/jira/browse/INFRA-7604 > > > > > On Sun, Apr 20, 2014 at 8:09 AM, Andrew Musselman < > andrew.mussel...@gmail.com>

[jira] [Commented] (MAHOUT-1468) Creating a new page for StreamingKMeans documentation on mahout website

2014-04-16 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972316#comment-13972316 ] Ted Dunning commented on MAHOUT-1468: - Maxim, That is fantastic. > Creatin

Re: Tackling the "legacy dilemma"

2014-04-15 Thread Ted Dunning
Manoj, Sounds like a fair trade there. Hopefully, you would consider upgrading if we get Andy's code ported to the DSL or if we incorporate the h2o random forest implementation. On Tue, Apr 15, 2014 at 7:51 PM, Manoj Awasthi wrote: > > * remove Random Forest as we cannot even answer questio

Re: Mahout without a CLI?

2014-04-15 Thread Ted Dunning
Well... I think it is an issue that has to do with figuring out how to *avoid* import and export as much as possible. On Tue, Apr 15, 2014 at 6:36 PM, Pat Ferrel wrote: > Which is why it’s an import/export issue. > > On Apr 15, 2014, at 5:48 PM, Ted Dunning wrote: > > On Tue,

Re: Mahout without a CLI?

2014-04-15 Thread Ted Dunning
On Tue, Apr 15, 2014 at 10:58 AM, Pat Ferrel wrote: > As to the statement "There is not, nor do i think there will be a way to > run this stuff with CLI” seems unduly misleading. Really, does anyone > second this? > > There will be Scala scripts to drive this stuff and yes even from the CLI. > Do

[jira] [Commented] (MAHOUT-1510) Goodbye MapReduce

2014-04-15 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970281#comment-13970281 ] Ted Dunning commented on MAHOUT-1510: - [~yxjiang] Hopefully we can be much

[jira] [Commented] (MAHOUT-1439) Update talks on Mahout

2014-04-14 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969186#comment-13969186 ] Ted Dunning commented on MAHOUT-1439: - @nimartin That would be SOOO hel

Re: Tackling the "legacy dilemma"

2014-04-13 Thread Ted Dunning
On Sun, Apr 13, 2014 at 10:16 AM, Dmitriy Lyubimov wrote: > +1, but more importantly, reject any new author who doesn't agree to > explicitly plegdge a multi-year support. > I am a little bit negative about this requirement. My feeling is that it will wind up with accepting naive optimists (the

Re: Tackling the "legacy dilemma"

2014-04-13 Thread Ted Dunning
On Sun, Apr 13, 2014 at 10:16 AM, Dmitriy Lyubimov wrote: > > * move the MR algorithms into a separate maven module > You mean, move them out of mahout-core? So the core is for single machine > stuff only? Plus utils? We probably need to refactor core so there's no > core at all it seems. Our c

<    1   2   3   4   5   6   7   8   9   10   >