Re: Mahout 1.0 features (revisited)

2014-10-24 Thread Dmitriy Lyubimov
Ah, you've already added links behind the X marks .. :) ok awesome. On Fri, Oct 24, 2014 at 11:23 AM, Dmitriy Lyubimov wrote: > This is so awesome. > > i think i need to re(move) issues links from scala spark bindings page and > just point to this page. Perhaps it is good to

Re: Mahout 1.0 features (revisited)

2014-10-24 Thread Dmitriy Lyubimov
This is so awesome. i think i need to re(move) issues links from scala spark bindings page and just point to this page. Perhaps it is good to add the links to the jiras, at least for WIP entries On Fri, Oct 24, 2014 at 11:19 AM, Andrew Palumbo wrote: > aha.. yes.. Fixed it.. thx. and please do

Re: Mahout 1.0 features (revisited)

2014-10-24 Thread Dmitriy Lyubimov
awesome (typo in "Collaborative filtering" (have i spelled it right?) :) On Fri, Oct 24, 2014 at 10:17 AM, Andrew Palumbo wrote: > ok, committed here: > > http://mahout.apache.org/users/basics/algorithms.html > > thx > > > From: ap@outlook.com > > To: dev@mahout.apache.org > > Subject:

[jira] [Commented] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

2014-10-22 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180766#comment-14180766 ] Dmitriy Lyubimov commented on MAHOUT-1493: -- I am guessing no answer mean

[jira] [Updated] (MAHOUT-1493) Port Naive Bayes to the Spark DSL

2014-10-22 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1493: - Assignee: Andrew Palumbo (was: Sebastian Schelter) > Port Naive Bayes to the Spark

Re: Mahout Vs Spark

2014-10-22 Thread Dmitriy Lyubimov
For the record, this is all false dilemma (at least w.r.t. spark vs mahout spark bindings). The spark bindings have never been concieved as one vs another. Mahout scala bindings is on-top add-on to spark that just happens to rely on some of things in mahout-math. With spark one gets some major t

Re: Any idea which approaches to non-liniear svm are easily parallelizable?

2014-10-22 Thread Dmitriy Lyubimov
a which approaches to non-liniear svm are easily > parallelizable? > > To: dev@mahout.apache.org > > > > Last I heard, the best methods pre-project and do linear SVM. > > > > Beyond that, I would guess that deep learning techniques would subsume > > non-l

Any idea which approaches to non-liniear svm are easily parallelizable?

2014-10-21 Thread Dmitriy Lyubimov
in particular, from libSVM -- http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf ? thanks. -d

Re: Upgrade to Spark 1.1.0?

2014-10-21 Thread Dmitriy Lyubimov
rently on linux but I don’t think you need > to delete is anyway. > > On Oct 21, 2014, at 12:27 PM, Dmitriy Lyubimov wrote: > > fwiw i never built spark using maven. Always use sbt assembly. > > On Tue, Oct 21, 2014 at 11:55 AM, Pat Ferrel > wrote: > > > Ok, the m

Re: Upgrade to Spark 1.1.0?

2014-10-21 Thread Dmitriy Lyubimov
On Tue, Oct 21, 2014 at 10:26 AM, Pat Ferrel wrote: > Sorry to hear. I bet you’ll find a way. > > The Spark Jira trail leads to two suggestions: > 1) use spark-submit to execute code with your own entry point (other than > spark-shell) One theory points to not loading all needed Spark classes fro

Re: Upgrade to Spark 1.1.0?

2014-10-21 Thread Dmitriy Lyubimov
on’t show up until runtime, > passing all build tests. > > Dmitriy, have you successfully used any Spark version other than 1.0.1 on > a cluster? If so do you recall the exact order and from what sources you > built? > > > On Oct 21, 2014, at 9:35 AM, Dmitriy Lyubimov wrot

Re: Upgrade to Spark 1.1.0?

2014-10-21 Thread Dmitriy Lyubimov
ctDispatcher.scala:386) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.jav

Re: Upgrade to Spark 1.1.0?

2014-10-20 Thread Dmitriy Lyubimov
it. Can we set up > the build machine to do this? I’d feel better about eyeballing deps if we > could have a TEST_MASTER automatically run during builds at Apache. Maybe > the regular unit tests are OK for building locally ourselves. > > > > > On Oct 20, 2014, at 12:23 PM, Dm

Re: Why is mahout moving to spark?

2014-10-15 Thread Dmitriy Lyubimov
main reason is that MR is non-starter with most of moderately to highly iterative machine learning methods. Same can be said of Java. It is (strictly) my opinion that java is poorly fit to encode math, especially tensor math. On Wed, Oct 15, 2014 at 12:44 PM, thejas prasad wrote: > Hey all, >

Re: QR variations and a better QR for o.a.m.math.decompositions

2014-10-10 Thread Dmitriy Lyubimov
For reference on Mahout's Given's N-way QR, see [2] [2] http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf Sections 4.6 ... 4.7 On Fri, Oct 10, 2014 at 2:10 PM, Dmitriy Lyubimov wrote: > Please look at this survey [1]. I am currently reading t

QR variations and a better QR for o.a.m.math.decompositions

2014-10-10 Thread Dmitriy Lyubimov
Please look at this survey [1]. I am currently reading thru it. Quite comprehensive. Where it fits us: ssvd, dssvd depend on QR accuracy quite a bit. What we have right now in "new" algebra DSL is shown there as algorithm 9, "Cholesky QR". The benefit is that it is extremely easy and paralleliza

Re: [jira] [Commented] (MAHOUT-1615) SparkEngine drmFromHDFS returning the same Key for all Key,Vec Pairs for Text-Keyed SequenceFiles

2014-10-10 Thread Dmitriy Lyubimov
i already mentioned that i don't want any whiff of hadoop stuff in math-scala. For most part, because of impossibility to pinpoint exact hadoop api version a third party wants to use. There will always be applications claiming incompatbility with this or that in Hadoop with that approach. it might

Re: Interested in developing for mahout

2014-10-02 Thread Dmitriy Lyubimov
Matlab specific binding are largely abandoned in favor of common set of things that tends to be more R-Like stuff. the reason is that a lot of things (such as e.g. elementwise tensor operators) in matlab are impossible to implement in Scala operators, or they would have undesired precedence. R "bas

Re: Interested in developing for mahout

2014-10-02 Thread Dmitriy Lyubimov
Lloyd K-means iteration is possible in pure algebraic form. I published a fragment of the code on this list at some point. I can probably do archive search and dig it out. Various algorithm inducing techniques are probably quasi algebraic. Which means there will be some things that cannot be expr

Re: persistence function naming convnentions

2014-09-28 Thread Dmitriy Lyubimov
e(dest: String, schema: Schema = DefaultSchema) > > Once read the DRM is a CheckpointedDrm contained in the IndexedDataset. So > call it import/export or persistence a user can use either the sequence > file or text to read/write DRMs > > Seem reasonable? > > On Sep 26, 2014,

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
On Thu, Sep 25, 2014 at 8:50 AM, Dmitriy Lyubimov wrote: > As for pure scala backend, it already exists and it is called Breeze > project (something MLib uses internally), supported by David Hall (among > others). It also includes a lot more common non-distributed math than just > al

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
12:25 AM, Ted Dunning wrote: > > > > On Wed, Sep 24, 2014 at 11:09 PM, Dmitriy Lyubimov > > wrote: > > > >> Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side > >> concept of different function types which are unfortunately not > c

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-25 Thread Dmitriy Lyubimov
4 at 11:09 PM, Dmitriy Lyubimov > wrote: > > > Aggregate is Colt's thing. Colt (aka Mahout-math) establish java-side > > concept of different function types which are unfortunately not > compatible > > with Scala literals. > > > > Dmitriy, > > Is this

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-24 Thread Dmitriy Lyubimov
On Wed, Sep 24, 2014 at 9:15 PM, Saikat Kanjilal wrote: > Shannon/Dmitry,Quick question, I'm wanting to calculate the scala > equivalent of the frobenius norm per this API spec in python ( > http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html), > I dug into the mahout-math-

Re: Mahout-1539-computation of gaussian kernel between 2 arrays of shapes

2014-09-18 Thread Dmitriy Lyubimov
you want a REALLY-REALLY big matrix? as in distributed matrix? On Thu, Sep 18, 2014 at 12:28 PM, Saikat Kanjilal wrote: > > http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html > I need to implement the above in the scala world and expose a DSL API to > call

Re: drmFromHDFS rowLabelBindings question

2014-09-14 Thread Dmitriy Lyubimov
nversion. > > >> > > >> after: > > >> > > >> val rowBindings = d.map(t => (t._1._1.toString, t._2: > > > java.lang.Integer)).toMap > > >> > > >> rowBindings.size is one > > >> > > >>

Re: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Dmitriy Lyubimov
ort > > > > > reduce-like operations, there is no DSL operator for that yet. We > could > > > > > either introduce a reduce/aggregate operator in as engine > > > neutral/close to > > > > > algebraic way as possible, or keep any kind of reduction/aggregate >

Re: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Dmitriy Lyubimov
; > item > > > > similarity jobs where you do math things like B.t %*% A After you do > the > > > > math using the drm contained in the IndexedDataset you assign the > correct > > > > dictionaries to the resulting IndexedDataset to maintain your labels

Re: drmFromHDFS rowLabelBindings question

2014-09-13 Thread Dmitriy Lyubimov
ring, t._2: > > java.lang.Integer)).toMap > >> > >> rowBindings.size is one > >> > >> From: ap@outlook.com > >> To: dev@mahout.apache.org > >> Subject: RE: drmFromHDFS rowLabelBindings question > >> Date: Fri, 12 Sep 2014 15:

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
just using it in the NB implementation so its not a pressing issue. > > > > > > Appreciate it. > > > > > > > Date: Fri, 12 Sep 2014 12:35:21 -0700 > > > > Subject: Re: drmFromHDFS rowLabelBindings question > > > > From: av...@gluster.or

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
Fri, Sep 12, 2014 at 11:57 AM, Dmitriy Lyubimov > > wrote: > > > >> bit i you are really compelled that it is something that might be > needed, > >> the best way probably would be indeed create an optional parameter to > >> collect (something like drmLike.col

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
On Fri, Sep 12, 2014 at 11:56 AM, Anand Avati wrote: > On Fri, Sep 12, 2014 at 11:30 AM, Dmitriy Lyubimov > wrote: > > > Actually, as it stands, collect doesn't support labels (either as keys or > > Named Vectors). > > > > There are 2 considerations

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
assinging them to in-core matrix' row labels. (requires a patch of course) On Fri, Sep 12, 2014 at 11:55 AM, Dmitriy Lyubimov wrote: > if you bail out into pure Spark out of algebraic DSL, yes. > > something like drmA.rdd.map(_._1).collect > > On Fri, Sep 12, 2014 at 11:46 AM, And

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
if you bail out into pure Spark out of algebraic DSL, yes. something like drmA.rdd.map(_._1).collect On Fri, Sep 12, 2014 at 11:46 AM, Andrew Palumbo wrote: > Ok thanks- All that I need is a Vector of the String keys of the Drm > (they contain the category labels that I need)- I think i was j

Re: drmFromHDFS rowLabelBindings question

2014-09-12 Thread Dmitriy Lyubimov
Actually, as it stands, collect doesn't support labels (either as keys or Named Vectors). There are 2 considerations: (1) I chose to ignore any use of NamedVectors in DRM since DRM already has row keys, and two different sources have been creating ambiguity of interpretation, so i tailored all the

Re: setting spark config parameters for shell

2014-09-11 Thread Dmitriy Lyubimov
yeah these things need to be tweaked for a particular application. Truth is, i have not yet used the shell for anything formiddable yet. For me at this point it is just a fine concept. I've been doing embedded spark use (at which point one of course has a full control over SparkConf stuff). On Thu

Re: setting spark config parameters for shell

2014-09-11 Thread Dmitriy Lyubimov
I remember i had a good answer for these type of things in context of the shell, but have forgotten the answer... bummer: ) In spark, you can just pass them in with -Dname=value. May need tweaking bin/mahout script though. that's what i dont remember. I thought we were setting a reasonable defaul

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

2014-09-11 Thread Dmitriy Lyubimov
11, 2014 at 12:24 AM, Ankit Sharma wrote: > Hi Dmitriy, > > Did you mean something like implementing *"Strassen algorithm"* for matrix > multiplication? > > thanks & best regards, > > Ankit > > On Wed, Sep 10, 2014 at 10:59 PM, Dmitriy Lyubimov >

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

2014-09-10 Thread Dmitriy Lyubimov
oth of them are single-node library so > they seems not possible to use directly. > > > On 09/10/2014 01:29 PM, Dmitriy Lyubimov wrote: > >> The biggest problem today (in my opinion) is mahout-math. >> >> (1) cost/type based optimization of matrix-matrix m

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

2014-09-10 Thread Dmitriy Lyubimov
The biggest problem today (in my opinion) is mahout-math. (1) cost/type based optimization of matrix-matrix multiplication (2) cost/type based optimization of elementwise matrix-matrix operations There is already some work done there, especially in the realm of vector-vector opreations, so matrix

[jira] [Comment Edited] (MAHOUT-1490) Data frame R-like bindings

2014-09-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127267#comment-14127267 ] Dmitriy Lyubimov edited comment on MAHOUT-1490 at 9/9/14 5:3

[jira] [Comment Edited] (MAHOUT-1490) Data frame R-like bindings

2014-09-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127267#comment-14127267 ] Dmitriy Lyubimov edited comment on MAHOUT-1490 at 9/9/14 5:3

[jira] [Comment Edited] (MAHOUT-1490) Data frame R-like bindings

2014-09-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127267#comment-14127267 ] Dmitriy Lyubimov edited comment on MAHOUT-1490 at 9/9/14 5:2

[jira] [Commented] (MAHOUT-1490) Data frame R-like bindings

2014-09-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127267#comment-14127267 ] Dmitriy Lyubimov commented on MAHOUT-1490: -- Just so we are clear, i don&#

[jira] [Comment Edited] (MAHOUT-1490) Data frame R-like bindings

2014-09-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127267#comment-14127267 ] Dmitriy Lyubimov edited comment on MAHOUT-1490 at 9/9/14 5:2

[jira] [Commented] (MAHOUT-1610) Tests can be made more robust to pass in Java 8

2014-08-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116102#comment-14116102 ] Dmitriy Lyubimov commented on MAHOUT-1610: -- it seems to be Spark (spur

Re: [jira] [Commented] (MAHOUT-1610) Tests can be made more robust to pass in Java 8

2014-08-29 Thread Dmitriy Lyubimov
it seems to be Spark (spurious) failure, unable to open broadcast variable handle. It would be nice to understand more why it happened, either we are doing something wrong causing something to race, or it is something bad with this version of Spark (or even build host setup?) causing this. It is u

Re: [jira] [Commented] (MAHOUT-1610) Tests can be made more robust to pass in Java 8

2014-08-28 Thread Dmitriy Lyubimov
Including PRs that may happen on private company github branches to private code once public master is merged to them. Such is unfortunate workings of github. So please don't. On Thu, Aug 28, 2014 at 9:52 AM, Dmitriy Lyubimov wrote: > > > > On Thu, Aug 28, 2014 at 9:48 AM, Ted

Re: [jira] [Commented] (MAHOUT-1610) Tests can be made more robust to pass in Java 8

2014-08-28 Thread Dmitriy Lyubimov
On Thu, Aug 28, 2014 at 9:48 AM, Ted Dunning wrote: > > If you do the commit with the github note "closes #xx", then github does > the right thing. Your commit does the merge. > This is a bad advice. Please always use github repo specified. it needs to say "closes apache/mahout #xx" since just

Re: Features by engine page

2014-08-26 Thread Dmitriy Lyubimov
meant to be normal of course On Tue, Aug 26, 2014 at 10:39 AM, Dmitriy Lyubimov wrote: > scala, if you want) to write something like `new > MultivariateUniformDistribution(mu,sigma).sample()`, so i really just dsl- >

Re: Features by engine page

2014-08-26 Thread Dmitriy Lyubimov
even ever tried outside SGD where datapoints are abundant albeit incomplete. On Tue, Aug 26, 2014 at 8:32 AM, Ted Dunning wrote: > On Mon, Aug 25, 2014 at 2:40 PM, Dmitriy Lyubimov > wrote: > > > This work is obviously also interesting in that it > > establishes pr

Re: Features by engine page

2014-08-25 Thread Dmitriy Lyubimov
en missing piece is data prep of course, but i think i > > can eventually contribute a couple tutorials of how to do vectorization > > using SparkQL stuff. > > > > > > > > -d > > > > > > > > > > On Mon, Aug 25, 2014 at 2:19 PM, Pat

Re: Features by engine page

2014-08-25 Thread Dmitriy Lyubimov
MAHOUT-1604 is in development > > I thought SSVD with PCA was working on Spark. > > > On Aug 25, 2014, at 2:15 PM, Dmitriy Lyubimov wrote: > > this is super-cool to hear. > > > On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann > wrote: > > > Hi Andrew, &

Re: Features by engine page

2014-08-25 Thread Dmitriy Lyubimov
this is super-cool to hear. On Mon, Aug 25, 2014 at 1:53 PM, Till Rohrmann wrote: > Hi Andrew, > > I like the overview of the different algorithms. The Flink bindings are > still under development. We hope to finish them in the next couple of > weeks. > > Best regards, > > Till > > > On Mon, Au

Re: help needed to track down an issue

2014-08-15 Thread Dmitriy Lyubimov
hout's own log4j since it only applies for front-end but any standalone spark workers (in case they are runnin on the same node) will still be using spark's setup. which is good. On Fri, Aug 15, 2014 at 5:17 PM, Dmitriy Lyubimov wrote: > one more bit of info > > i also seem t

Re: help needed to track down an issue

2014-08-15 Thread Dmitriy Lyubimov
2014 at 4:51 PM, Dmitriy Lyubimov > wrote: > > > with spark shell. Trying to clamp down on the INFO logging to console. > > Can't seem to solve it. > > > > Since spark itself seems to add $SPARK_HOME/conf classpath (to verify > what > > mahout ends up with

help needed to track down an issue

2014-08-15 Thread Dmitriy Lyubimov
with spark shell. Trying to clamp down on the INFO logging to console. Can't seem to solve it. Since spark itself seems to add $SPARK_HOME/conf classpath (to verify what mahout ends up with in this case, run bin/mahout -spark classpath | sed 's/:/\n/g'), one obvious solution is to modify spark's c

[jira] [Updated] (MAHOUT-1606) Add rowSums, rowMeans and diagonal extraction operations to distributed matrices

2014-08-15 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1606: - Resolution: Fixed Status: Resolved (was: Patch Available) > Add rowS

[jira] [Updated] (MAHOUT-1606) Add rowSums, rowMeans and diagonal extraction operations to distributed matrices

2014-08-15 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1606: - Status: Patch Available (was: Open) > Add rowSums, rowMeans and diagonal extract

Re: GPU, lapack Matrix adaptations

2014-08-15 Thread Dmitriy Lyubimov
Sorry, this should say it is true that sparse algebra is by far more compelling than dense one; On Fri, Aug 15, 2014 at 9:54 AM, Dmitriy Lyubimov wrote: > As i indicated, i think it is a worthy move. As i said before (including > Spark list), it is true that dense algebra is by fa

Re: GPU, lapack Matrix adaptations

2014-08-15 Thread Dmitriy Lyubimov
/cuda operation for vectors. On Wed, Aug 13, 2014 at 3:39 PM, Anand Avati wrote: > On Fri, Jul 18, 2014 at 12:01 PM, Dmitriy Lyubimov > wrote: > > > On Fri, Jul 18, 2014 at 11:54 AM, Anand Avati wrote: > > > > > On Fri, Jul 18, 2014 at 11:42 AM, Dmitri

Re: LLR and negative correlation

2014-08-13 Thread Dmitriy Lyubimov
Aha. just what was needed. Now nonsensical co-occurrences are filtering out too. Thank you. On Wed, Aug 13, 2014 at 4:59 PM, Ted Dunning wrote: > I use k_11 / (k_11 + k_12) > k_21 / (k_21 + k_22) for the sign. > > > > > On Wed, Aug 13, 2014 at 4:45 PM, Dmitriy Lyubimov &

Re: LLR and negative correlation

2014-08-13 Thread Dmitriy Lyubimov
perhaps something along the lines p(A and B) > p(notA and notB)? On Wed, Aug 13, 2014 at 4:42 PM, Dmitriy Lyubimov wrote: > Hello, > > i would be greatful for a hint for a following problem here in > cooccurrence analysis. It may be not most practical one but it appeared in >

LLR and negative correlation

2014-08-13 Thread Dmitriy Lyubimov
Hello, i would be greatful for a hint for a following problem here in cooccurrence analysis. It may be not most practical one but it appeared in the test. The problem is that LLR tests for independence. As such, it would give high scores for negatively correlated events too. E.g. say countA = 91

[jira] [Created] (MAHOUT-1606) Add rowSums, rowMeans and diagonal extraction operations to distributed matrices

2014-08-12 Thread Dmitriy Lyubimov (JIRA)
Dmitriy Lyubimov created MAHOUT-1606: Summary: Add rowSums, rowMeans and diagonal extraction operations to distributed matrices Key: MAHOUT-1606 URL: https://issues.apache.org/jira/browse/MAHOUT-1606

Re: co-occurrence paper and code

2014-08-12 Thread Dmitriy Lyubimov
tributions about which prediction is made. On Mon, Aug 11, 2014 at 5:22 PM, Ted Dunning wrote: > Opportunity in this case does not refer to "Saw something and decided to > buy/click/view/listen". Instead, it refers to the entire process including > the UI and whatever disc

Re: co-occurrence paper and code

2014-08-11 Thread Dmitriy Lyubimov
3 PM, Dmitriy Lyubimov wrote: > No, the question was why total number of trials sent to LLR is considered > to be m where M is m x n is a user/item matrix. > > ok i got it. i made some incorrect assumption about previous code steps, > hence my inference derailed. > > > > &g

Re: co-occurrence paper and code

2014-08-11 Thread Dmitriy Lyubimov
you binarize the original occurrence matrix then it seems to me that the > values in the cooccurrence matrix *are* user counts. > > Perhaps I misunderstand your original question. > > > > On Mon, Aug 11, 2014 at 3:44 PM, Dmitriy Lyubimov > wrote: > > > sorry rather total

Re: co-occurrence paper and code

2014-08-11 Thread Dmitriy Lyubimov
sorry rather total occurrences of a pair should be sum(a_i) + sum(a_j) - a_ij (not 1norm of course) On Mon, Aug 11, 2014 at 3:11 PM, Dmitriy Lyubimov wrote: > Why coocurrence code takes number of users as total interactions? > shouldn't that be 1-norm of the co-occurrence matrix? &

Re: co-occurrence paper and code

2014-08-11 Thread Dmitriy Lyubimov
Why coocurrence code takes number of users as total interactions? shouldn't that be 1-norm of the co-occurrence matrix? On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov wrote: > So, compared to original paper [1], similarity is now hardcoded and always > LLR? Do we have a

Re: Upgrade to spark 1.0.x

2014-08-11 Thread Dmitriy Lyubimov
ok. merging and dropping spark 1.0.x branch from apache as well. On Sat, Aug 9, 2014 at 2:06 PM, Peng Cheng wrote: > +1 > > 1.0.0 is recommended. Many release after 1.0.1 has a short test cycle and > 1.0.2 apparently reverted many fix for causing more serious problem. > > > On 14-08-09 04:51 PM

Upgrade to spark 1.0.x

2014-08-08 Thread Dmitriy Lyubimov
Current master is still at Spark 0.9.x . MAHOUT-1603 (PR #40) is making a number of valuable tweaks to enable Spark 1.0.x and (Spark SQL code, by extension. I did a quick test, SQL seems to work for my simple tests in Mahout environment). This squashed PR is pushed to apache/mahout branch spark-1.

[jira] [Commented] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-08 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091158#comment-14091158 ] Dmitriy Lyubimov commented on MAHOUT-1603: -- merged to apache/spark-1.0.x br

Re: co-occurrence paper and code

2014-08-07 Thread Dmitriy Lyubimov
a > threshold, but it is applied to counts on only some similarity classes. > > > > On Wed, Aug 6, 2014 at 5:07 PM, Dmitriy Lyubimov > wrote: > > > On Wed, Aug 6, 2014 at 5:04 PM, Ted Dunning > wrote: > > > > > On Wed, Aug 6, 2014 at 6:01 PM, Dmitriy Ly

Re: co-occurrence paper and code

2014-08-07 Thread Dmitriy Lyubimov
On Thu, Aug 7, 2014 at 11:34 AM, Ted Dunning wrote: > > > Can you say a bit more about what you are trying to do? > Thank you. I would like to customize co-oc code not to just yank top N scored co-occurrences, but also make sure that all of them satisfy rejection of coincidence with a given con

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
on of my own to co-occurrence construction. Would that be reasonable if i do that? On Wed, Aug 6, 2014 at 5:12 PM, Dmitriy Lyubimov wrote: > Asking because i am considering pulling this implementation but for some > (mostly political) reasons people want to try different things here. >

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
restrict ourselves to approaches that work with implicit, count-like > data. > > -s > Am 06.08.2014 16:58 schrieb "Ted Dunning" : > > > On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov > > wrote: > > > > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimo

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
On Wed, Aug 6, 2014 at 5:04 PM, Ted Dunning wrote: > On Wed, Aug 6, 2014 at 6:01 PM, Dmitriy Lyubimov > wrote: > > > > LLR is a classic test. > > > > > > What i meant here it doesn't produce a p-value. or does it? > > > > It produces an asy

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
Just was wondering if this was compensated for somewhere else that i don't immediately see. > > > > On Wed, Aug 6, 2014 at 5:21 PM, Dmitriy Lyubimov > wrote: > > > So, compared to original paper [1], similarity is now hardcoded and > always > > LLR? Do we

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
014, at 4:54 PM, Dmitriy Lyubimov wrote: > > is this > >val bcastNumInteractions = > drmBroadcast(drmI.numNonZeroElementsPerColumn) > > any different from just saying `drmI.colSums`? > > > On Wed, Aug 6, 2014 at 4:49 PM, Dmitriy Lyubimov > wrote: > > &g

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
On Wed, Aug 6, 2014 at 4:57 PM, Ted Dunning wrote: > On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov > wrote: > > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov > > wrote: > > > > I suppose in that context LLR is considered a distance (higher scores >

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
is this val bcastNumInteractions = drmBroadcast(drmI.numNonZeroElementsPerColumn) any different from just saying `drmI.colSums`? On Wed, Aug 6, 2014 at 4:49 PM, Dmitriy Lyubimov wrote: > > > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov > wrote: > > I suppose i

Re: co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov wrote: I suppose in that context LLR is considered a distance (higher scores mean > more `distant` items, co-occurring by chance only)? > Self-correction on this one -- having given a quick look at llr paper again, it looks like it is actu

co-occurrence paper and code

2014-08-06 Thread Dmitriy Lyubimov
So, compared to original paper [1], similarity is now hardcoded and always LLR? Do we have any plans to parameterize that further? Is there any reason to parameterize it? Also, reading the paper, i am a bit wondering -- similarity and distance are functions that usually are moving into different

CoocurrenceAnalysis[Suite].scala -> math-scala (?)

2014-08-06 Thread Dmitriy Lyubimov
Sorry can't recollect what this discussion ended with. Why are we not moving these files to math-scala? the code seems to be engine-independent.

Re: Requiring Java 1.7 for Mahout

2014-08-06 Thread Dmitriy Lyubimov
0400 > > > > > > > > oracle? > > > > > > > > > Date: Wed, 6 Aug 2014 13:54:43 -0700 > > > > > Subject: Re: Requiring Java 1.7 for Mahout > > > > > From: dlie...@gmail.com > > > > > To: dev@mahout.apache.or

Re: Requiring Java 1.7 for Mahout

2014-08-06 Thread Dmitriy Lyubimov
ng Java 1.7 for Mahout > > > From: dlie...@gmail.com > > > To: dev@mahout.apache.org > > > > > > or testing. > > > > > > > > > On Wed, Aug 6, 2014 at 1:54 PM, Dmitriy Lyubimov > wrote: > > > > > > > My current java is 1.6

Re: Requiring Java 1.7 for Mahout

2014-08-06 Thread Dmitriy Lyubimov
My current java is 1.6.0_38, i have no problem building. On Wed, Aug 6, 2014 at 1:52 PM, Andrew Palumbo wrote: > you're right- my big concern is that on our (probably outdated) building > from source page we have 1.6 listed: > > http://mahout.apache.org/developers/buildingmahout.html > > The ob

Re: Requiring Java 1.7 for Mahout

2014-08-06 Thread Dmitriy Lyubimov
or testing. On Wed, Aug 6, 2014 at 1:54 PM, Dmitriy Lyubimov wrote: > My current java is 1.6.0_38, i have no problem building. > > > On Wed, Aug 6, 2014 at 1:52 PM, Andrew Palumbo wrote: > >> you're right- my big concern is that on our (probably outdated) building &

Re: Requiring Java 1.7 for Mahout

2014-08-06 Thread Dmitriy Lyubimov
the only problem is that we are not really requiring it. We are not using anything of 1.7 functionality. If people compile (as i do) Mahout, they can compile any bytecode version they want. There are some 1.7 artifact dependencies in H20 but 1.7 would be required at run time only and only if the p

[jira] [Created] (MAHOUT-1603) Tweaks for Spark 1.0.x

2014-08-04 Thread Dmitriy Lyubimov (JIRA)
Dmitriy Lyubimov created MAHOUT-1603: Summary: Tweaks for Spark 1.0.x Key: MAHOUT-1603 URL: https://issues.apache.org/jira/browse/MAHOUT-1603 Project: Mahout Issue Type: Task

[jira] [Commented] (MAHOUT-1500) H2O integration

2014-07-31 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081093#comment-14081093 ] Dmitriy Lyubimov commented on MAHOUT-1500: -- The reason additional revie

[jira] [Comment Edited] (MAHOUT-1599) Add rand() operator to math-scala

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078693#comment-14078693 ] Dmitriy Lyubimov edited comment on MAHOUT-1599 at 7/30/14 12:3

[jira] [Comment Edited] (MAHOUT-1599) Add rand() operator to math-scala

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078693#comment-14078693 ] Dmitriy Lyubimov edited comment on MAHOUT-1599 at 7/30/14 12:3

[jira] [Commented] (MAHOUT-1599) Add rand() operator to math-scala

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078693#comment-14078693 ] Dmitriy Lyubimov commented on MAHOUT-1599: -- ah.sorry. there's n

[jira] [Resolved] (MAHOUT-1596) support for rbind() operator on DRMs

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov resolved MAHOUT-1596. -- Resolution: Fixed > support for rbind() operator on D

[jira] [Commented] (MAHOUT-1599) Add rand() operator to math-scala

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078099#comment-14078099 ] Dmitriy Lyubimov commented on MAHOUT-1599: -- Not sure I can connect a s

[jira] [Comment Edited] (MAHOUT-1599) Add rand() operator to math-scala

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078099#comment-14078099 ] Dmitriy Lyubimov edited comment on MAHOUT-1599 at 7/29/14 6:1

[jira] [Comment Edited] (MAHOUT-1599) Add rand() operator to math-scala

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078061#comment-14078061 ] Dmitriy Lyubimov edited comment on MAHOUT-1599 at 7/29/14 5:4

[jira] [Commented] (MAHOUT-1599) Add rand() operator to math-scala

2014-07-29 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078061#comment-14078061 ] Dmitriy Lyubimov commented on MAHOUT-1599: -- there's a bunch of epheme

<    3   4   5   6   7   8   9   10   11   12   >