Re: spark naive bayes exception

2015-04-03 Thread Dmitriy Lyubimov
saw a lot of these, some still bewildering, but they all related to non-local mode (different classpaths on backed and front end). On Fri, Apr 3, 2015 at 1:39 PM, Andrew Palumbo ap@outlook.com wrote: Has anybody seen an exception like this when running a spark job? the job completes but

Re: Fwd: IM addresses?

2015-04-02 Thread Dmitriy Lyubimov
i guess i can run it on the phone. On Thu, Apr 2, 2015 at 10:30 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: signed in but can't connect, probably because of filtering. On Thu, Apr 2, 2015 at 8:10 AM, Andrew Palumbo ap@outlook.com wrote: it looks like only admins can invite. On 04

Re: Fwd: IM addresses?

2015-04-02 Thread Dmitriy Lyubimov
signed in but can't connect, probably because of filtering. On Thu, Apr 2, 2015 at 8:10 AM, Andrew Palumbo ap@outlook.com wrote: it looks like only admins can invite. On 04/02/2015 11:05 AM, Andrew Palumbo wrote: I'll try to invite someone, i'm on it now. On 04/02/2015 10:59 AM, Pat

Re: [mahout] MAHOUT-1655 (#86)

2015-04-01 Thread Dmitriy Lyubimov
Pat, duplication of my email to your PR is coincidental. It is not about your PR. Sorry. I was looking at the master log and posting to @dev. On Wed, Apr 1, 2015 at 1:31 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: yeah. https://github.com/apache/mahout/commits/master. This link is MASTER

Re: [mahout] MAHOUT-1655 (#86)

2015-04-01 Thread Dmitriy Lyubimov
On Wed, Apr 1, 2015 at 1:01 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Actually, 617 if git pull brings merge (somebody pushed something while you were doing changelog etc.) there'd be merge. I'd try to rebase in 617 this case (if it works) to avoid merge, if possible. or re-do the whole

Re: [mahout] MAHOUT-1655 (#86)

2015-04-01 Thread Dmitriy Lyubimov
, 2015, at 11:53 AM, Dmitriy Lyubimov notificati...@github.com wrote: yeah. https://github.com/apache/mahout/commits/master. we should not see merged master commits there (clear sign of not squashing your personal PR history! ) On Wed, Apr 1, 2015 at 11:26 AM, Suneel Marthi notificati

Re: [mahout] MAHOUT-1655 (#86)

2015-04-01 Thread Dmitriy Lyubimov
happen; at least to me. On Wed, Apr 1, 2015 at 12:58 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Pat, actually i did not say I noticed problems in your commits. It was somebody else :) On Wed, Apr 1, 2015 at 12:41 PM, Pat Ferrel p...@occamsmachete.com wrote: Here is my history dump

Re: [mahout] MAHOUT-1655 (#86)

2015-04-01 Thread Dmitriy Lyubimov
in the master log is cleaned up. IMO there is no problem here. On Apr 1, 2015, at 1:05 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Wed, Apr 1, 2015 at 1:01 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Actually, 617 if git pull brings merge (somebody pushed something while you were doing

[jira] [Assigned] (MAHOUT-1641) Add conversion from a RDD[(String, String)] to a Drm[Int]

2015-03-31 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov reassigned MAHOUT-1641: Assignee: Dmitriy Lyubimov Add conversion from a RDD[(String, String)] to a Drm

Re: [jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-31 Thread Dmitriy Lyubimov
FYI t-digest is now is also part of spark classpath, part of stream-lib. On Tue, Mar 31, 2015 at 4:05 PM, Suneel Marthi (JIRA) j...@apache.org wrote: [

[jira] [Updated] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

2015-03-30 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-1660: - Affects Version/s: (was: 0.10.0) 0.10.1

[jira] [Commented] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

2015-03-30 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387162#comment-14387162 ] Dmitriy Lyubimov commented on MAHOUT-1660: -- i have a fix for that. if you don't

[jira] [Assigned] (MAHOUT-1660) Hadoop1HDFSUtil.readDRMHEader should be taking Hadoop conf

2015-03-30 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov reassigned MAHOUT-1660: Assignee: Dmitriy Lyubimov (was: Suneel Marthi) Hadoop1HDFSUtil.readDRMHEader

Re: Anyone using eclipse?

2015-03-30 Thread Dmitriy Lyubimov
I switched to idea since i started doing mixed projects with scala. Standalone scala is bearable in eclipse but mixed projects simply don't work. (and Mahout likely one of them). On Mon, Mar 30, 2015 at 3:58 PM, Suneel Marthi suneel.mar...@gmail.com wrote: I believe its only Shannon from

Re: iScala Notebook

2015-03-28 Thread Dmitriy Lyubimov
So I can plot with matplotlib from scala here? Any examples ? On Mar 28, 2015 7:13 AM, Suneel Marthi suneel.mar...@gmail.com wrote: Here's a gist of an iScala notebook, and has integration with matplotlib for visualization, could complement well with present scala-shell. Thoughts?

Re: [jira] [Created] (MAHOUT-1659) Remove deprecated Lanczos solver from spectral clustering in mr-legacy

2015-03-27 Thread Dmitriy Lyubimov
Shannon, How difficult would it be to port spectral clustering to our scala alg and math? We have ssvd there as well. On Mar 27, 2015 7:26 AM, Shannon Quinn (JIRA) j...@apache.org wrote: Shannon Quinn created MAHOUT-1659: - Summary: Remove

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
Note also that all these related beasts come in pairs (in-core input - distributed input): ssvd - dssvd spca - dspca On Fri, Mar 27, 2015 at 3:45 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: But MR version of SSVD is more stable because of the QR differences. On Fri, Mar 27, 2015 at 3:44

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
But MR version of SSVD is more stable because of the QR differences. On Fri, Mar 27, 2015 at 3:44 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Yes. Except it doesn't follow same parallel reordered Givens QR but uses Cholesky QR (which we call thin QR) as an easy-to-implement shortcut

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
Yes. Except it doesn't follow same parallel reordered Givens QR but uses Cholesky QR (which we call thin QR) as an easy-to-implement shortcut. But this page makes no mention of QR specifics i think On Fri, Mar 27, 2015 at 12:57 PM, Andrew Palumbo ap@outlook.com wrote: math-scala dssvd

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
The algorithm outline for in-core is exactly the same. Except in-core version is using Householder Reflections QR (I think). but logic is exactly the same. On Fri, Mar 27, 2015 at 3:58 PM, Andrew Palumbo ap@outlook.com wrote: On 03/27/2015 06:46 PM, Dmitriy Lyubimov wrote: Note also

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
in the implementation than there are steps in the algorithm. On 03/27/2015 06:58 PM, Andrew Palumbo wrote: On 03/27/2015 06:46 PM, Dmitriy Lyubimov wrote: Note also that all these related beasts come in pairs (in-core input - distributed input): ssvd - dssvd spca - dspca yeah I've been thinking

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
content in the same place. On 03/27/2015 06:58 PM, Andrew Palumbo wrote: On 03/27/2015 06:46 PM, Dmitriy Lyubimov wrote: Note also that all these related beasts come in pairs (in-core input - distributed input): ssvd - dssvd spca - dspca yeah I've been thinking that i'd give a less detailed

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
it is possible to say import o.a.m.math._ import decompositions._ then it will assume second line as o.a.m.math.decompositions automatically On Fri, Mar 27, 2015 at 4:09 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: i think there's a typo in package name under usage. It should

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
and R simulation sources perhaps ... On Fri, Mar 27, 2015 at 4:57 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Andrew, thanks a lot! I think acknowledgement and refference to N. Halko's dissertation from MR page is also worthy of mention on this page as well. On Fri, Mar 27, 2015 at 4:41

Re: math-scala dssvd docs

2015-03-27 Thread Dmitriy Lyubimov
know if you see any other changes that need to be made. On 03/27/2015 07:06 PM, Dmitriy Lyubimov wrote: In fact, algorithm just executes the outline formulas. Not always line for line, but step for step for sure. On Fri, Mar 27, 2015 at 4:05 PM, Andrew Palumbo ap@outlook.com wrote

Re: [jira] [Commented] (MAHOUT-1653) Spark 1.3

2015-03-25 Thread Dmitriy Lyubimov
I'd venture a herecy again. what if we put off 1.3 compatibility until better times and focus on 0.9...1.2.x compatibility we have now. Chances are by the time we are done with 0.10.x we'd need to consider 1.4 or 1.5 at the rate this project is bloating. Keep in mind that higher version != higher

Re: naming

2015-03-24 Thread Dmitriy Lyubimov
, at 11:02 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: I like math-*. And it is math only there. Or was last time i checked. it will be what R calls R-base, and I would welcome no other scope there. all environment things are math. all ML things are math. quasi-newton, bayesian optimizers, linear

Re: [jira] [Created] (MAHOUT-1654) Migrate from Maven to SBT

2015-03-24 Thread Dmitriy Lyubimov
ok... but this will require also reworking jenkins and CI builds... and build engineering always scared me :) On Tue, Mar 24, 2015 at 10:54 AM, Stevo Slavic (JIRA) j...@apache.org wrote: Stevo Slavic created MAHOUT-1654: Summary: Migrate from

Re: 0.10 release Hangout

2015-03-24 Thread Dmitriy Lyubimov
On Tue, Mar 24, 2015 at 11:21 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Summary of the call; please chime in with any corrections or clarifications: (1) Support Lucene 5, Hadoop 2, Java 7, Spark 1.1 1.3 (2) Ensure build is solid, add Scaladocs in poms and in Jenkins (3)

Re: [jira] [Commented] (MAHOUT-1655) Refactor module dependencies

2015-03-24 Thread Dmitriy Lyubimov
I had the same idea myself. i like it. On Tue, Mar 24, 2015 at 1:35 PM, Stevo Slavić ssla...@gmail.com wrote: If I understand correctly, mrlegacy should remain, just hdfs/non-mr stuff extracted into separate module, for reuse in math-scala and mahout-spark module, so they do not depend on

[jira] [Commented] (MAHOUT-1648) Update Mahout's CMS for 0.10.0

2015-03-24 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378571#comment-14378571 ] Dmitriy Lyubimov commented on MAHOUT-1648: -- Re: thin QR: this is also known

[jira] [Commented] (MAHOUT-1648) Update Mahout's CMS for 0.10.0

2015-03-24 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378587#comment-14378587 ] Dmitriy Lyubimov commented on MAHOUT-1648: -- why? they are 100% algebraic

Re: naming

2015-03-23 Thread Dmitriy Lyubimov
I like math-*. And it is math only there. Or was last time i checked. it will be what R calls R-base, and I would welcome no other scope there. all environment things are math. all ML things are math. quasi-newton, bayesian optimizers, linear search are all math. Stats are math. als, (d)ssvd,

Re: Spark 1.3.0

2015-03-23 Thread Dmitriy Lyubimov
lemme read this issue really quick. This looks like a redundant double-contract. Why require implicit conversions if they are already requiring explicit types? And vice versa. On Sun, Mar 22, 2015 at 10:17 AM, Pat Ferrel p...@occamsmachete.com wrote: Due to a bug in spark we have a nasty work

Re: Spark 1.3.0

2015-03-23 Thread Dmitriy Lyubimov
this a bit as well. 'cause spark api has the same problems. On Mon, Mar 23, 2015 at 11:06 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: lemme read this issue really quick. This looks like a redundant double-contract. Why require implicit conversions if they are already requiring explicit types

Re: Release

2015-03-18 Thread Dmitriy Lyubimov
We (well, I at least) are extremely appreciative of Anand's effort and commitment to integrate h2o engine as one of the algebraic backs. Even 10 times more so as it turns out it had nothing to do with 0xdata. I think among other things i think we can agree this renders the suggestion of h20

[jira] [Commented] (MAHOUT-1648) Update Mahout's CMS for 0.10.0

2015-03-18 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367581#comment-14367581 ] Dmitriy Lyubimov commented on MAHOUT-1648: -- Maybe we should really call stuff

[jira] [Commented] (MAHOUT-1648) Update Mahout's CMS for 0.10.0

2015-03-18 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367801#comment-14367801 ] Dmitriy Lyubimov commented on MAHOUT-1648: -- looks like algebra and environment

[jira] [Commented] (MAHOUT-1648) Update Mahout's CMS for 0.10.0

2015-03-18 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367923#comment-14367923 ] Dmitriy Lyubimov commented on MAHOUT-1648: -- yes i'd go by spliting

Re: Release

2015-03-17 Thread Dmitriy Lyubimov
it. Is anyone signing up for that? On Mar 17, 2015, at 8:59 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: I dont like the term dsl. It is algebtaic optimizer, folks. Calling it dsl brings in wrong and too trivial ideas about it. On Mar 17, 2015 8:27 AM, Andrew Palumbo ap@outlook.com

Re: Release

2015-03-17 Thread Dmitriy Lyubimov
i was thinking 0.10.0 mid-april, update 0.10.1 end of spring. i would suggest feature extraction topics for 0.11.x. Esp. w.r.t. SchemaRDD aka DataFrame -- vectorizing, hashing, ML schema support, imputation of missing data, outlier cleanups etc. There's a lot. Hardware backs integration -- i

Re: Release

2015-03-17 Thread Dmitriy Lyubimov
IMO deprecated is for something that got released at least once. If there's no intent to see it released, it just should be purged. On Tue, Mar 17, 2015 at 12:49 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Tue, Mar 17, 2015 at 10:14 AM, Pat Ferrel p...@occamsmachete.com wrote: I’m nervous

Re: Release

2015-03-17 Thread Dmitriy Lyubimov
On Tue, Mar 17, 2015 at 8:26 AM, Andrew Palumbo ap@outlook.com wrote: On 03/15/2015 01:42 PM, Pat Ferrel wrote: Lots of discussion off the record about doing a release but shouldn’t we plan this? What has to be in a release of Mahout 0.10? Seems like we could release as-is but it

Re: Release

2015-03-17 Thread Dmitriy Lyubimov
... but it would be nice to confirm with them directly here on @dev to avoid pitfalls of third party hearsay of course. But I assume if nobody comes forward and the tests are not working then the issue of releasing the contribution is moot. On Tue, Mar 17, 2015 at 12:55 PM, Dmitriy Lyubimov dlie

Re: Release

2015-03-17 Thread Dmitriy Lyubimov
I dont like the term dsl. It is algebtaic optimizer, folks. Calling it dsl brings in wrong and too trivial ideas about it. On Mar 17, 2015 8:27 AM, Andrew Palumbo ap@outlook.com wrote: On 03/15/2015 01:42 PM, Pat Ferrel wrote: Lots of discussion off the record about doing a release but

Re: PR#73 faster vectors

2015-03-10 Thread Dmitriy Lyubimov
We already discussed this. honestly, i don't see it as a priority for a number of reasons. (1) it increases dependency footprint (2) it only increases speed (i think) of random access vectors (3) it still will not get us anywhere close in terms of matrix-matrix operations to where mkl, openblas

Re: JIRA- legacy scala labels

2015-03-06 Thread Dmitriy Lyubimov
my take is legacy is just a module (aka maven artifact). Just like it is now. we just need to re-route(cut) dependencies on it. On Fri, Mar 6, 2015 at 2:56 PM, Pat Ferrel p...@occamsmachete.com wrote: The simplest way to split the project is into engines—hadoop and spark. What is happening

Re: Mahout in ASF BigTop

2015-03-06 Thread Dmitriy Lyubimov
We are dropping support of ML on MR. Our backs are now (as it stands) are spark and h20; mostly spark. Maybe flink in the future. On Fri, Mar 6, 2015 at 11:31 AM, jay vyas jayunit100.apa...@gmail.com wrote: Hi Mahout. We are prepping for 0.9 Release of bigtop. Is anyone in the mahout

Re: [jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Dmitriy Lyubimov
nope afaik. MAHOUT_OPTS is the place to set that (if we are talking about shell). On Thu, Mar 5, 2015 at 3:50 PM, Andrew Palumbo (JIRA) j...@apache.org wrote: [

[jira] [Resolved] (MAHOUT-1603) Tweaks for Spark 1.0.x

2015-03-05 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov resolved MAHOUT-1603. -- Resolution: Fixed Tweaks for Spark 1.0.x

Re: [jira] [Comment Edited] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Dmitriy Lyubimov
note that with MAHOUT_OPTS you have a choice. You can either set up env or you can use inline syntax like MAHOUT_OPTS='-Dk=n' bin/mahout spark-shell On Thu, Mar 5, 2015 at 4:50 PM, Andrew Palumbo (JIRA) j...@apache.org wrote: [

Re: [jira] [Commented] (MAHOUT-1643) CLI arguments are not being processed in spark-shell

2015-03-05 Thread Dmitriy Lyubimov
the hack i have only takes MAHOUT_OPTS. it normally actually makes more sense to set it there since spark options are too numerous and too long to enter on command line. so i'd say we need to support MAHOUT_OPTS at minimum; or both. On Thu, Mar 5, 2015 at 4:04 PM, Andrew Palumbo (JIRA)

Re: PMML

2015-03-04 Thread Dmitriy Lyubimov
I am willing to +1 any contribution at this point. my previous company used pmml to serialize simple stuff, but i don't have first hand experience. Its flexibility is ultimately pretty limited, isn't it? And xml is ultimately a media which is too ugly and too verbose at the same time to represent

Re: Spark 1.1.1 and 1.2.1

2015-03-04 Thread Dmitriy Lyubimov
is bug also present in 1.2.0? cdh 5.3 is 1.2.0 not 1.2.1 On Wed, Mar 4, 2015 at 1:58 PM, Pat Ferrel p...@occamsmachete.com wrote: Spark 1.2.1 has a bug that blocks any JavaSerializer without a work around. It requires the SparkConf to get a path to a jar that exists on all nodes. So I’ve

Re: Question with contributing first steps

2015-03-04 Thread Dmitriy Lyubimov
(1) no mentors this year. (2) what was the PR #? On Wed, Mar 4, 2015 at 2:35 PM, Олег Зотов olegzoto...@gmail.com wrote: Hi I want to contribute to the Mahout and I have two questions: 1) What about Mahout and Google Summer of Code this year? 2) To take the first step, I fixed one not so

Re: Refactor

2015-03-04 Thread Dmitriy Lyubimov
afaik spark is not built by sbt by default any longer, but rather, by maven. At least self-build instructions are for maven only (and sbt build changed enough that i can't effectively use it any longer). but when it was, it was always sbt-for-all. On Wed, Mar 4, 2015 at 3:45 PM, Pat Ferrel

Re: Question with contributing first steps

2015-03-04 Thread Dmitriy Lyubimov
think h20 has some java code mixed in). On Wed, Mar 4, 2015 at 2:41 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: (1) no mentors this year. (2) what was the PR #? On Wed, Mar 4, 2015 at 2:35 PM, Олег Зотов olegzoto...@gmail.com wrote: Hi I want to contribute to the Mahout and I have two

Broken non-zero iterator in VectorView?

2015-03-02 Thread Dmitriy Lyubimov
it looks like assigning 0s in a view of SequentialAccessSparseVector doesn't work, as it internally using setQuick() which tirms the length of non-zero elements (?) which causes invalidation of the iterator state. in particular, this simple test fails: val svec = new

Bug in non-zero iterators over Sequential Access sparse vectors

2015-03-02 Thread Dmitriy Lyubimov
it looks like assigning 0s in a view of SequentialAccessSparseVector doesn't work, as it internally using setQuick() which tirms the length of non-zero elements (?) which causes invalidation of the iterator state. in particular, this simple test fails: val svec = new

Re: Broken non-zero iterator in VectorView?

2015-03-02 Thread Dmitriy Lyubimov
...@occamsmachete.com wrote: I vaguely remember the NonZeroIterator being optimized just as we were switching to Scala. Something Sebastian was working on but no idea if it was related to this. On Mar 2, 2015, at 3:51 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: actually the test error is something

Re: Broken non-zero iterator in VectorView?

2015-03-02 Thread Dmitriy Lyubimov
it looks like an attempt to eliminate reusable elements in vector view's iterators but why? Vector contract already implies element reusability inside iterators, so why special treatment inside vector views? umph. On Mon, Mar 2, 2015 at 3:35 PM, Dmitriy Lyubimov dlie...@gmail.com wrote

Re: Broken non-zero iterator in VectorView?

2015-03-02 Thread Dmitriy Lyubimov
actually the test error is something else but i think vector view iterator implementation is still wrong. I will scan if that produces any more errors elsewhere. On Mon, Mar 2, 2015 at 3:43 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: this test is failing after i remove non-reusable elements

Re: Broken non-zero iterator in VectorView?

2015-03-02 Thread Dmitriy Lyubimov
org.apache.mahout.math.VectorBinaryAggregateCostTest T On Mon, Mar 2, 2015 at 3:41 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: it looks like an attempt to eliminate reusable elements in vector view's iterators but why? Vector contract already implies element reusability inside iterators, so why special treatment inside vector views

Re: Spark shell broken

2015-02-27 Thread Dmitriy Lyubimov
Following CDH releases perhaps helps a bit. they tend to skip buggy releases. On Fri, Feb 27, 2015 at 2:07 PM, Pat Ferrel p...@occamsmachete.com wrote: The deserialization thing is a Spark bug. The work around requires that you put a key/value in the SparkConf to point to a jar on _all_

Re: Question about Spark versions

2015-02-26 Thread Dmitriy Lyubimov
algebraic optimizer binary should be compatible with pretty wide range of spark. At very least, current head is backward compatible with 1.1.x. The only thing that locked it to that is using unpersist api. Before that it should've been compatible all the way to at least 0.9. spark 0.8.something

Re: What is Mahout?

2015-02-25 Thread Dmitriy Lyubimov
I think a release with some value in it and a talk clarifying status will suffice for starters. Name change IMO is immaterial if there's the value and talks clarify general philosophy sufficiently. Nobody else can tell people better what it is all about, it is lack of the release and information

Re: What is Mahout?

2015-02-25 Thread Dmitriy Lyubimov
-1 on incubation as well. The website and docs and user lists and this champion and mentor stuff, and logos and promotions for committers absolutely do not make any sense at this point. From what i hear, people are pretty busy without having that as it is. It would probably make more sense to

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
ASF also mirrors dropping branches, i remember doing that too. but it won't allow history rewrites. On Tue, Feb 24, 2015 at 11:35 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: what exactly did you try to do? just resetting HEAD will not work on remote branch -- you need force-sync

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
Branches: refs/heads/spark-1.2 [created] 901ef03b4 On Tue, Feb 24, 2015 at 11:47 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: yeah ok so you pushed 1.2 branch to asf but it is not yet in github. iti should be there eventually, give it a bit of time. On Tue, Feb 24, 2015 at 11:35 AM

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
seems like different builds on client and backend. shell is using your local spark setup (pointed to with SPARK_HOME). make sure it points to identical binaries (not just spark version) to what is used in the backend. the reason is spark is not binary-canonical w.r.t. to release version, it

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
is it local or standalone? local should not have these types of errors. for anything else it is likely what i said. On Tue, Feb 24, 2015 at 1:08 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: seems like different builds on client and backend. shell is using your local spark setup (pointed

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
Andrew, perhaps you could commit a patch on top of 1.2 branch? much appreciated. On Tue, Feb 24, 2015 at 1:25 PM, Andrew Palumbo ap@outlook.com wrote: sorry- I left out the scala-compiler artifact (at the top) it should read: dependency groupIdorg.scala-lang/groupId

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
yeah ok so you pushed 1.2 branch to asf but it is not yet in github. iti should be there eventually, give it a bit of time. On Tue, Feb 24, 2015 at 11:35 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: what exactly did you try to do? just resetting HEAD will not work on remote branch -- you

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
(eventually) get there. On Tue, Feb 24, 2015 at 11:18 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Does ASF git get mirrored to GitHub? I tried pushing a branch and don't see it there yet. On Tue, Feb 24, 2015 at 11:16 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Tue, Feb 24

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
issued a revert to head. On Tue, Feb 24, 2015 at 11:47 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: yeah ok so you pushed 1.2 branch to asf but it is not yet in github. iti should be there eventually, give it a bit of time. On Tue, Feb 24, 2015 at 11:35 AM, Dmitriy Lyubimov dlie

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
:30 PM, Dmitriy Lyubimov wrote: Andrew, perhaps you could commit a patch on top of 1.2 branch? much appreciated. On Tue, Feb 24, 2015 at 1:25 PM, Andrew Palumbo ap@outlook.com wrote: sorry- I left out the scala-compiler artifact (at the top) it should read: dependency

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
PS normally i would just reset the head to ^1 but that would require forced rewrite and asf git doesn't allow this (and for a good reason, really). so revert on master would be necessary. On Tue, Feb 24, 2015 at 10:51 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: i mean roll back #74 and apply

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
, Dmitriy Lyubimov dlie...@gmail.com wrote: As a remedy, i'd suggest to branch out spark 1.2 work and rollback 1.2.1 commit on master until 1.2 branch is fixed. On Tue, Feb 24, 2015 at 10:19 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: oops. tests dont test shell startup

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
oops. tests dont test shell startup. apparently stuff got out of sync with 1.2 On Tue, Feb 24, 2015 at 10:02 AM, Pat Ferrel p...@occamsmachete.com wrote: Me too and I built with 1.2.1 On Feb 24, 2015, at 9:50 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I've just rebuild mahout

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
As a remedy, i'd suggest to branch out spark 1.2 work and rollback 1.2.1 commit on master until 1.2 branch is fixed. On Tue, Feb 24, 2015 at 10:19 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: oops. tests dont test shell startup. apparently stuff got out of sync with 1.2 On Tue, Feb 24

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
On Tue, Feb 24, 2015 at 10:55 AM, Pat Ferrel p...@occamsmachete.com wrote: to be safe I’d “git reset —hard xyz” to the commit previous to the 1.2.1 As i just explained, that resets are not possible with ASF git. Reverting is the only option. -d

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) On Tue, Feb 24, 2015 at 1:08 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: seems like different builds on client and backend. shell

Re: Spark shell broken

2015-02-24 Thread Dmitriy Lyubimov
On 02/24/2015 05:15 PM, Andrew Musselman wrote: Makes sense; I'm still getting those errors after restarting my rebuilt spark.. On Tue, Feb 24, 2015 at 2:12 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: IIRC MAHOUT_LOCAL doesn't mean a thing with spark mode. It is purely MR thing

Re: intermitent unit test error

2015-02-20 Thread Dmitriy Lyubimov
We had various operational configuration problems with snappy as well so had to disable it for now completely until somebody has time to figure it out (which has been like forever) On Thu, Feb 19, 2015 at 4:26 PM, Pat Ferrel p...@occamsmachete.com wrote: It seems like after a clean install I

Re: Codebase refactoring proposal

2015-02-08 Thread Dmitriy Lyubimov
of MLlib Vector and back would solve my Kmeans use case. You know MLlib better than I so choose the best level to perform type conversions or inheritance splicing. The point is to make the two as seamless as possible. Doesn’t this seem a worthy goal? On Feb 8, 2015, at 4:59 PM, Dmitriy Lyubimov

Re: Codebase refactoring proposal

2015-02-08 Thread Dmitriy Lyubimov
either/or choices for devs. On Feb 5, 2015, at 1:32 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Thu, Feb 5, 2015 at 1:14 AM, Gokhan Capan gkhn...@gmail.com wrote: What I am saying is that for certain algorithms including both engine-specific (such as aggregation) and DSL stuff, what

Re: Faster collections for a faster Mahout

2015-02-05 Thread Dmitriy Lyubimov
thank you very much. Github pull request is what we use these days. Do you think you could put one up ? thanks. -d On Thu, Feb 5, 2015 at 1:17 AM, Sebastiano Vigna vi...@di.unimi.it wrote: On 19 Jan 2015, at 22:26, Robin Anil robin.a...@gmail.com wrote: @Sebastiano, sounds like an easy

Re: Codebase refactoring proposal

2015-02-05 Thread Dmitriy Lyubimov
On Thu, Feb 5, 2015 at 1:14 AM, Gokhan Capan gkhn...@gmail.com wrote: What I am saying is that for certain algorithms including both engine-specific (such as aggregation) and DSL stuff, what is the best way of handling them? i) should we add the distributed operations to Mahout codebase as

Re: Codebase refactoring proposal

2015-02-04 Thread Dmitriy Lyubimov
. Need to think through how to use a DataFrame in a streaming case, probably through some checkpointing of the window DStream—hmm. On Feb 4, 2015, at 7:37 AM, Andrew Palumbo ap@outlook.com wrote: On 02/03/2015 08:22 PM, Dmitriy Lyubimov wrote: I'd suggest to consider

Re: Codebase refactoring proposal

2015-02-04 Thread Dmitriy Lyubimov
btw a good seq2sparse and seqdirectory ports are the only thing that separates us from having bigram, trigram based LSA tutorial. On Wed, Feb 4, 2015 at 10:35 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: i think they are debating the details now, not the idea. Like how NA is different from

Re: Codebase refactoring proposal

2015-02-04 Thread Dmitriy Lyubimov
at 2:07 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Spark's DataFrame is obviously not agnostic. I don't believe there's a good way to abstract it. Unfortunately. I think getting too much into distributed operation abstraction is a bit dangerous. I think MLI was one project that attempted

Re: Codebase refactoring proposal

2015-02-04 Thread Dmitriy Lyubimov
On Wed, Feb 4, 2015 at 1:51 PM, Andrew Palumbo ap@outlook.com wrote: My thought was not to bring primitive engine specific aggregetors, combiners, etc. into math-scala. Yeah. +1. I would like to support that as an experiment, see where it goes. Clearly some distributed use cases are

Re: Codebase refactoring proposal

2015-02-04 Thread Dmitriy Lyubimov
Re: Gokhan's PR post: here are my thoughts but i did not want to post it there since they are going beyond the scope of that PR's work to chase the root of the issue. on quasi-algebraic methods What is the dilemma here? don't see any. I already explained that no more

Re: TF-IDF, seq2sparse and DataFrame support

2015-02-04 Thread Dmitriy Lyubimov
On Feb 4, 2015, at 7:47 AM, Andrew Palumbo ap@outlook.com wrote: Just copied over the relevant last few messages to keep the other thread on topic... On 02/03/2015 08:22 PM, Dmitriy Lyubimov wrote: I'd suggest to consider this: remember all this talk about language-integrated spark

Re: Codebase refactoring proposal

2015-02-04 Thread Dmitriy Lyubimov
both expressed interest in the distributed aggregation stuff. It sounds like we are agreeing that non-algebra—computation method type things can be engine specific. So does anyone have an objection to Gokhan pushing his PR? On Feb 4, 2015, at 2:20 PM, Dmitriy Lyubimov dlie...@gmail.com wrote

Re: Codebase refactoring proposal

2015-02-03 Thread Dmitriy Lyubimov
But first I need to do massive fixes and improvements to the distributed optimizer itself. Still waiting on green light for that. On Feb 3, 2015 8:45 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Feb 3, 2015 7:20 AM, Pat Ferrel p...@occamsmachete.com wrote: BTW what level of difficulty

Re: Codebase refactoring proposal

2015-02-03 Thread Dmitriy Lyubimov
similarity. Attach Kafka and get evergreen models, if not incrementally updating models. On Feb 2, 2015, at 4:54 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: bottom line compile-time dependencies are satisfied with no extra stuff from mr-legacy or its transitives. This is proven by virtue

Re: Extending spark-itemsimilarity for calculation multiple cross-indicators

2015-02-03 Thread Dmitriy Lyubimov
On Tue, Feb 3, 2015 at 11:57 AM, Олег Зотов olegzoto...@gmail.com wrote: Hello. I develop recommendation system and use mahout on spark (1.0 snapshot). In the process I have found, that spark-itemsimilarity driver do not allow to process more than two action types. After reading the

Re: Extending spark-itemsimilarity for calculation multiple cross-indicators

2015-02-03 Thread Dmitriy Lyubimov
PS to run mahout shell, one can use MASTER=master mahout/bin spark-shell Syntax to load scripts is retained from Scala shell. ideally one also needs stuf like MAHOUT_OPTS=-Xmx=5G but as i mentioned it is broken right now, you can do a quick hack On Tue, Feb 3, 2015 at 12:06 PM, Dmitriy

Re: Codebase refactoring proposal

2015-02-03 Thread Dmitriy Lyubimov
and IDF. I'll put them up soon. Hopefully they'll be of some use. On Feb 3, 2015, at 8:47 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: But first I need to do massive fixes and improvements to the distributed optimizer itself. Still waiting on green light for that. On Feb 3, 2015 8:45 AM

<    1   2   3   4   5   6   7   8   9   10   >