Re: Friday hangout

2018-09-03 Thread Dmitriy Lyubimov
so does mine. 9-10 am PST? On Mon, Sep 3, 2018 at 12:10 PM Ivan Serdyuk wrote: > Google calendar reports "Could not find the requested". > > > > On Mon, Sep 3, 2018 at 8:46 PM Andrew Palumbo wrote: > > > Probably my calendar messed it up. > > Thx > > --andy > > > > On Sep 3, 2018 10:32 AM, Andr

Re: Error spark-mahout when spark-submit mode cluster

2018-08-08 Thread Dmitriy Lyubimov
it is user-unsubscribe@m.a.o On Wed, Aug 8, 2018 at 6:47 AM, Eric Link wrote: > unsubscribe > > On Wed, Aug 1, 2018 at 8:54 AM Jaume Galí wrote: > > > Hi everybody, I'm trying to build a basic recomender with Spark and > Mahout > > on Scala. I use the follow mahout repo to compile mahout with s

Re: Error spark-mahout when spark-submit mode cluster

2018-08-08 Thread Dmitriy Lyubimov
My best guess is that it looks like serialization problem at the cluster/master. This typically happens if class or java versions are different between driver/worker(s). Why that ended up being the case in your particular case, for me it is hard to tell. Bottom line, I do not believe this is a Maho

Re: Hangouts

2018-07-31 Thread Dmitriy Lyubimov
I am on vacation this week fyi On Tue, Jul 31, 2018 at 11:36 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Cool, I'll shoot for something on Friday early Pacific time and put an > invite in here; looking forward to it! > > On Sat, Jul 28, 2018 at 9:26 AM Shannon Quinn wrote: > > >

Re: Congrats Palumbo and Holden

2018-05-02 Thread Dmitriy Lyubimov
Congrats! On Wed, May 2, 2018 at 1:25 PM, Trevor Grant wrote: > Both were just elected new ASF members!! > > https://s.apache.org/D6iz >

Re: Users of Scala 2.11

2018-04-27 Thread Dmitriy Lyubimov
correct address is : user-unsubscr...@mahout.apache.org On Thu, Apr 26, 2018 at 10:08 PM, Paul Crochet wrote: > unsubscribe > > 2018-04-24 21:08 GMT+03:00 Pat Ferrel : > > > Hi all, > > > > Mahout has hit a bit of a bump in releasing a Scala 2.11 version. I was > > able to build 0.13.0 for Scala

Re: distributed cholesky on mahout

2018-04-19 Thread Dmitriy Lyubimov
no distributed Cholesky as far as i know. Thin QR or ssvd. On Wed, Apr 18, 2018 at 7:08 PM, QIFAN PU wrote: > Hi, > > I'm wondering if distributed cholesky decomposition on mahout is supported > now. > From this doc: > https://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf > It see

Re: Updating Wikipedia

2018-02-19 Thread Dmitriy Lyubimov
I think Suneel was modifying it... On Sun, Feb 18, 2018 at 7:02 AM, Trevor Grant wrote: > Is anyone good at Wikipedia? > > We're still listed as being primarily running on Hadoop there. > > https://en.wikipedia.org/wiki/Apache_Mahout > > If anyone has some skills/time- an update would be cool...

Re: Mahout and Spark 2.2 compatibility

2017-12-04 Thread Dmitriy Lyubimov
I can confirm i have not encounter fundamental issues with samsara (yet) while running with spark 2.2.0/scala 2.11.11 . it is mostly just adjusting the build to use proper versions of artifacts. On Mon, Dec 4, 2017 at 9:25 AM, Trevor Grant wrote: > Hi Marc, > > Actually, it's not THAT hard to ge

Re: spark-itemsimilarity scalability / Spark parallelism issues (SimilarityAnalysis.cooccurrencesIDSs)

2017-08-23 Thread Dmitriy Lyubimov
there has been some work on optimizing in-memory assigns for vectors, but the matrix work for the in-memory java-backed assigns is admittedly more patchy at best, given the amount of variations. On Mon, Aug 21, 2017 at 12:05 PM, Pat Ferrel wrote: > Matt > > I’ll create a feature branch of Mahout

Re: [DISCUSS] Naming convention for multiple spark/scala combos

2017-07-07 Thread Dmitriy Lyubimov
it would seem 2nd option is preferable if doable. Any option that has most desirable combinations prebuilt, is preferable i guess. Spark itself also releases tons of hadoop profile binary variations. so i don't have to build one myself. On Fri, Jul 7, 2017 at 8:57 AM, Trevor Grant wrote: > Hey a

Re: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Dmitriy Lyubimov
so people need to make sure their PR merges to develop instead of master? Do they need to PR against develop branch, and if not, who is responsible for confict resolution then that is to arise from diffing and merging into different targets? On Tue, Jun 20, 2017 at 10:09 AM, Pat Ferrel wrote: >

Re: Welcome New Committer Nikolay Sakharnykh

2017-05-01 Thread Dmitriy Lyubimov
Welcome!! On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykh wrote: > Hello everyone, > > I’m sorry for some delay with my introduction, have been swamped with > other projects recently ☺ > > Having worked at NVIDIA for around 8 years I have seen GPUs to evolve from > specialized graphics proce

Re: Samsara's learning curve

2017-03-29 Thread Dmitriy Lyubimov
matrix A inside optimization plan can actually be formed as A' if needed, as long as it doesn't meet the optimization barrier (i.e., collected or saved) On Wed, Mar 29, 2017 at 9:37 AM, Dmitriy Lyubimov wrote: > > > On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote: > >>

Re: Samsara's learning curve

2017-03-29 Thread Dmitriy Lyubimov
On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote: > While I agree with D and T, I’ll add a few things to watch out for. > > One of the hardest things to learn is the new model of execution, it’s not > quite Spark or any other compute engine. You need to create contexts that > have virtualized th

Re: Samsara's learning curve

2017-03-29 Thread Dmitriy Lyubimov
On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote: > > The other missing bit is dataframes. R and Spark have them in different > forms but Mahout largely ignores the issue of real world object ids. Mahout only supports matrices and vectors, not data frames. Data frames imply mix of various typ

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-29 Thread Dmitriy Lyubimov
On Wed, Mar 29, 2017 at 9:10 AM, Dmitriy Lyubimov wrote: > Sorry, i think more commonly if aggregating transpose is to be used, then > cenroid assignments are better be the key of the matrix D (so D:= A) and > aggregating transpose is performed on a matrix (1 | D)' (i.e., 1 cb

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-29 Thread Dmitriy Lyubimov
nd we can finish up cluster assignment via M = (1 | D)' C = M(:,2:) with each row hadamard-divided by first row of counts M(:,1) (implying Golub-Van Loan notations for subblocking) On Wed, Mar 29, 2017 at 9:02 AM, Dmitriy Lyubimov wrote: > the simplest scheme is to initialize distributed

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-29 Thread Dmitriy Lyubimov
the simplest scheme is to initialize distributed matrix of the shape D := (0 | A) where A is your dataset and 0 is a single column indicating current centroid assignment and distribute current centroid matrix C via matrix broadcast (assuming there are few enough centers). Then alternatively run cl

Re: Samsara's learning curve

2017-03-27 Thread Dmitriy Lyubimov
I believe writing in the DSL is simple enough, especially if you have some familiarity with Scala on top of R (or, in my case, R on top of Scala perhaps:). I've implemented about couple dozens customized algorithms that used distributed Samsara algebra at least to some degree, and I think I can rel

Re: Marketing

2017-03-24 Thread Dmitriy Lyubimov
On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel wrote: > The multiple backend support is such a waste of time IMO. The DSL and GPU > support is super important and should be made even more distributed. The > current (as I understand it) single threaded GPU per VM is only the first > step in what will

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-01 Thread Dmitriy Lyubimov
Isabel, if i understand it correctly, you are asking whether it makes sense add end2end scenarios based on Samsara to current codebase? The answer is, absolutely. Yes it does for both rather isolated issues (like computing clusters) and end-2-end scenarios. The only problem with end 2 end scenari

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Dmitriy Lyubimov
On Tue, Jan 31, 2017 at 3:01 AM, Isabel Drost-Fromm wrote: > > Hi, > > > To give some advise to downstream users in the field - what would be your > advise > for people tasked with concrete use cases (stuff like fraud detection, > anomaly > detection, learning search ranking functions, building a

Re: Recommenders and MABs

2016-09-21 Thread Dmitriy Lyubimov
there's been a great blog on that somewhere on richrelevance blog... But i have a vague feeling based on what you are saying it may be all old news to you... [1] http://engineering.richrelevance.com/bandits-recommendation-systems/ and there's more in the series On Sat, Sep 17, 2016 at 3:10 PM, Pa

Re: Text clustering how to?

2016-07-27 Thread Dmitriy Lyubimov
I think you have got a reply via jira. On Wed, Jul 27, 2016 at 10:50 AM, Raviteja Lokineni < raviteja.lokin...@gmail.com> wrote: > Anybody? > > On Thu, Jul 21, 2016 at 10:42 AM, Raviteja Lokineni < > raviteja.lokin...@gmail.com> wrote: > > > Hi all, > > > > I am pretty new to Apache Mahout. I am

Re: mahout tf-idf vs lucene tf-idf

2016-06-06 Thread Dmitriy Lyubimov
to add to Ted's reply, mahout has traditionally offered a bigram/trigram analysis as a part of its tf-idf conversion (a step away from the bag of words model so that directional statistically stable combinations of 2 or 3 words are reduced to their own term). However, this has not been ported to sp

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
I am just going to give you some design intents in the existing code. as far as i can recollect, mahout context gives complete flexibility. You can control the behavior but various degrees of overriding the default behavior and doing more or less work on context setup on your own. (I assume we are

Re: Clustering options

2016-05-23 Thread Dmitriy Lyubimov
Xavier, there are no exact equivalents in public domain to algorithms existed for MR clustering as of yet. My understanding some of them are on the roadmap though. depending on the level of sophistication you require, some of them are very easy to build though. On Sat, May 21, 2016 at 8:46 PM, FR

Re: RowSimilakrity : NotSerializableException

2016-05-07 Thread Dmitriy Lyubimov
you can also wrap mahout context around existing spark session (aka context). On Sat, May 7, 2016 at 9:41 PM, Rohit Jain wrote: > Yes, we did figure out this problem. And realised that instead sparkcontext > I have to use mahoutsparkcontext, > > On Sun, May 8, 2016 at 4:26 AM, Pat Ferrel wrote:

Re: Matrix inversion

2016-05-05 Thread Dmitriy Lyubimov
at 1:50 PM, Dmitriy Lyubimov wrote: > The mantra i keep hearing is that if someone needs matrix inversion then > he/she must be doing something wrong. Not sure how true that is, but in all > cases i have encountered, people try to avoid matrix inversion one way or > another. > >

Re: Matrix inversion

2016-05-05 Thread Dmitriy Lyubimov
The mantra i keep hearing is that if someone needs matrix inversion then he/she must be doing something wrong. Not sure how true that is, but in all cases i have encountered, people try to avoid matrix inversion one way or another. Re: libraries: Mahout is more about apis now than any particular i

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, (1) to be clear, the ASF trademark and branding policy is not to endorse views of the 3rd party publications and to ask 3rd party writers to do a disclosure that their views are not endorsed by ASF project. To that end, ASF project can't really tell you that some publication is "(in)appro

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, if you are using any Mahout Mapreduce algorithm for research, please make sure to make this disclosure: all Mahout MapReduce algorithms are officially not supported and deprecated since February, 2014 (IIRC). I can dig up a specific issue regarding this. There also has been an announceme

Re: spark.shuffle.memoryFraction

2016-04-20 Thread Dmitriy Lyubimov
i think in spark 1.6 this really became more flexible in terms of only specifying max/min thresholds. Yes shuffle spills in spark during multiplication are humongous, i tried a few hacks but that's spark. that's one of known bottlenecks unfortunately. You are welcome to try and hack A'B too. My pe

Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Dmitriy Lyubimov
not disrupt the overall flow? > > > Have a great evening! > Mihai > > > On 21 Mar 2016, at 19:31, Dmitriy Lyubimov wrote: > > > > my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy > > packaging. as long as MR is still here (and I would say

Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Dmitriy Lyubimov
my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy packaging. as long as MR is still here (and I would say it needs to be still here, unless it falls in complete disrepair and totally out of sync with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as MR goes,

Re: New Mahout "Samsara" Book

2016-02-25 Thread Dmitriy Lyubimov
For the purposes of this book (and otherwise too, as far as i know) "Samsara" is a release code name, defined as 0.10 and after. That includes all new code that happened after that, and the code that is still not deprectated (although most of MapReduce code is, by now, as evidenced by MAHOUT-1510)

Re: New Mahout "Samsara" Book

2016-02-25 Thread Dmitriy Lyubimov
ecked both links, they have only front and back cover of the book. No > > table of contents > > On Feb 25, 2016 9:57 AM, "Suneel Marthi" wrote: > > > >> You can see the TOC on Amazon > >> > >> > >> > http://www.amazon.com/Apache

Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

2016-02-16 Thread Dmitriy Lyubimov
BTW, depending on the resource manager, 10G per executor may not necessarily be a sufficient number. I never plan less than 1.5G per core (after excluding block manager, or 3Gb per core including block manager). That means that 10G executor memory might be barely enough for 4-core worker nodes. So

Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

2016-02-16 Thread Dmitriy Lyubimov
bottom line increase executor's non-mem-block memory and reduce indivdiual starting task size until it all fits. On Tue, Feb 16, 2016 at 4:09 PM, Dmitriy Lyubimov wrote: > the original exception definitely happens in the task when mahout tries to > build an entire matrix blo

Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

2016-02-16 Thread Dmitriy Lyubimov
the original exception definitely happens in the task when mahout tries to build an entire matrix block out of a partition. Use more tasks, smaller in size initially. using par(min=??) will help to repartition to at least ?? tasks. off-hdfs defaults are just too big for matrix processing. Not sure

Re: Mahout - problem importing to Eclipse

2016-02-08 Thread Dmitriy Lyubimov
ely OK, so we can just leave it at that. > > Best regards, > David > > On Mon, Feb 1, 2016 at 11:52 PM, Dmitriy Lyubimov > wrote: > > > the user list will not let attachments thru. > > > > On Sun, Jan 31, 2016 at 11:59 PM, David Starina > > > wrote

Re: Confusion regarding Samsara's configuration

2016-02-02 Thread Dmitriy Lyubimov
ahout itself. make sure to observe transitive dependency rules for the front end. On Tue, Feb 2, 2016 at 12:53 PM, Dmitriy Lyubimov wrote: > this is strange. if you took over the context, added jars manually and it > still does not work, there's something wrong with spark i guess or

Re: Confusion regarding Samsara's configuration

2016-02-02 Thread Dmitriy Lyubimov
nk you very much again, > > Kind Regards, > Bahaa > > > On Tue, Feb 2, 2016 at 12:01 PM, Dmitriy Lyubimov > wrote: > > > Bahaa, first off, i don't think we have certified any of releases to run > > with spar 1.6 (yet). I think spark 1.5 is the l

Re: Confusion regarding Samsara's configuration

2016-02-02 Thread Dmitriy Lyubimov
Bahaa, first off, i don't think we have certified any of releases to run with spar 1.6 (yet). I think spark 1.5 is the last known release to run with 0.11 series. Second, if you use mahoutSparkContext() method to create context, it would look for MAHOUT_HOME setup to add mahout binaries to the job

Re: Mahout - problem importing to Eclipse

2016-02-01 Thread Dmitriy Lyubimov
the user list will not let attachments thru. On Sun, Jan 31, 2016 at 11:59 PM, David Starina wrote: > Hi, > > I have problem importing the project to Eclipse - I get the error "Could > not update project mahout-mr configuration". Attaching the error as image. > Anyone seen this problem before? I

Re: Some test results

2015-12-30 Thread Dmitriy Lyubimov
Nice! On Dec 30, 2015 11:51 AM, "Pat Ferrel" wrote: > As many of you know Mahout-Samsara includes an interesting and important > extension to cooccurrence similarity, which supports cross-coossurrence and > log-likelihood downsampling. This, when combined with a search engine, > gives us a multim

Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Dmitriy Lyubimov
he branches (most likely > 0.10.x). No ? > > On Fri, Nov 6, 2015 at 7:05 PM, Dmitriy Lyubimov > wrote: > > > hm. I did not find the staging repo. is it gone already? > > > > One thing, if i may whine (I already asked for it last time): > > Can we please publish

Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Dmitriy Lyubimov
argh bummer. On Fri, Nov 6, 2015 at 4:01 PM, Suneel Marthi wrote: > Thanks. We have 3 +1 votes and no -1s. > > This release has passed and the Voting is officially closed, will send an > announcement out when the release has been finalized. > > Thanks again. > > On Fri, Nov 6, 2015 at 5:57 PM, A

Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Dmitriy Lyubimov
hm. I did not find the staging repo. is it gone already? One thing, if i may whine (I already asked for it last time): Can we please publish -tests artifacts, please pretty please? it is so much easier if derived applications could re-use mahout testing framework. On Fri, Nov 6, 2015 at 2:57 PM

Re: Is Mahout obsolete now?

2015-10-20 Thread Dmitriy Lyubimov
Pavan, I guess part of the documentation difficulty is in that Mahout Samsara environment is only used for "training" but external components are used for "scoring". So it is not 100% end-to-end Mahout solution to document. Pat, it would be nice though to put some of your docs on to Mahout site th

Re: Is Mahout obsolete now?

2015-10-19 Thread Dmitriy Lyubimov
matrix multiplication and > factorization. thanks, canal > > > On Tuesday, October 20, 2015 6:37 AM, Dmitriy Lyubimov < > dlie...@gmail.com> wrote: > > > On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel > wrote: > > > Even have code running using the Predici

Re: Is Mahout obsolete now?

2015-10-19 Thread Dmitriy Lyubimov
On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel wrote: > Even have code running using the PredicitonIO framework. This includesa > SDK to event store to realtime query. Loosely speaking a lambda > architecture. Most of the whole enchilada running except the content part > of the equation, which only

Re: matrix inversion in plan ?

2015-10-08 Thread Dmitriy Lyubimov
or pseudoinverse really, i guess On Thu, Oct 8, 2015 at 3:58 PM, Dmitriy Lyubimov wrote: > Mahout translation (approximation, since ssvd is reduced-rank, not the > true thing): > > val (drmU, drmV, s) = dssvd(drmA, k = 100) > val drmInvA = drmV %*% diagv(1 /=: s) %*% d

Re: matrix inversion in plan ?

2015-10-08 Thread Dmitriy Lyubimov
Mahout translation (approximation, since ssvd is reduced-rank, not the true thing): val (drmU, drmV, s) = dssvd(drmA, k = 100) val drmInvA = drmV %*% diagv(1 /=: s) %*% drmU.t Still, technically, it is a right inverse as in reality m is rarely the same as n. Also, k must be k<= drmA.nrow min drmA

Re: Exception in thread "main" java.lang.IllegalArgumentException: Unable to read output from "mahout -spark classpath"

2015-10-06 Thread Dmitriy Lyubimov
DRM format is compatible on persistence level with Mahout MapReduce algorithms. It is a Hadoop sequence file. The key is unique, can be one of -- unique ordinal IntWriteable, treated as a row number (i.e. nrow=max(int key)), or -- Text, LongWritable, BytesWritable, or .. forget what else. This t

Re: Exception in thread "main" java.lang.IllegalArgumentException: Unable to read output from "mahout -spark classpath"

2015-10-06 Thread Dmitriy Lyubimov
:) strictly speaking out of core is anything that is not in memory, e.g. sequential algorithms are generally also considered out-of-core btw i though 0.11.x was for 1.3? or that was re-certified for 1.4 too? On Tue, Oct 6, 2015 at 1:09 PM, Pat Ferrel wrote: > Linear algebra stuff is what Mahou

Re: sq_dist()

2015-09-10 Thread Dmitriy Lyubimov
eze, actually my current impl of sqDist uses it: > > https://github.com/danielkorzekwa/bayes-scala-gp/blob/master/src/main/scala/dk/gp/math/sqDist.scala > > still 3 times slower that sq_dist from gpml > > thanks for BID Data Project info > > On 9 September 2015 at 18:45, Dmitri

Re: sq_dist()

2015-09-09 Thread Dmitriy Lyubimov
Hi Daniel, you mean, for dense algebra single-threaded java vs. cache, multithreaded, SSE4-optimized Intel MKL? I am actually surprised it is not at least 10x. Mahout focuses on ease of distributed implementations (i.e. dsq_dist variant of the routine) but has been somewhat lazy on marrying mahou

Re: Apache Mahout build failure mahout-hdfs

2015-09-08 Thread Dmitriy Lyubimov
seems like a maven dependency problem (mahout-math does not publish its test artifacts?). i thought though that this was not a known issue... hm. On Tue, Sep 8, 2015 at 7:31 AM, Dulakshi Vihanga wrote: > When I tried to build mahout using "mvn clean install > -Dmaven.test.skip=true" I got the f

Re: Time Series Stuff

2015-08-14 Thread Dmitriy Lyubimov
Not that I know of. would be nice to have. On Fri, Aug 14, 2015 at 4:42 PM, Nick Kolegraff wrote: > Hey Mahouts, > Looking for some time series analysis stuff I can use in mahout. I don't > see much, other than this legacy HMM stuff. > > https://mahout.apache.org/users/classification/hidden-mar

Re: Matrix inverse

2015-08-09 Thread Dmitriy Lyubimov
Do you mean in core matrix inversion? It is supported via solve. Actually it is supported both in Java and Scala. On Aug 5, 2015 9:11 PM, "go canal" wrote: > Hello,I am new to Mahout. Would appreciate if someone could tell me if > matrix inverse is still supported in the latest release (0.10) ? I

Re: Setup questions for mahout with spark

2015-07-27 Thread Dmitriy Lyubimov
(1) all i ever used with spark is Oracle jvm. (2) take the head of either master or 0.10.x branch. the heads there are some ~30-odd bug fix issues apart from 0.10.1 release, we really should've released 0.10.2 and 0.11.0 by now but i guess end of summer is a slow season. (3) If you want to use spar

Re: Mahout on the cloud

2015-07-23 Thread Dmitriy Lyubimov
PPS. one of "better" backends, if there any comparison really is appropriate, is expected to be Apache Flink. On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov wrote: > i guess i was a bit vague. by quasi-agnostic i mean that some code, the > smaller part of it, may include

Re: Mahout on the cloud

2015-07-23 Thread Dmitriy Lyubimov
i guess i was a bit vague. by quasi-agnostic i mean that some code, the smaller part of it, may include specific backend engine dependencies unfortunately. it should be easily portable though. On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov wrote: > Mahout is moving to be backend-agnos

Re: Mahout on the cloud

2015-07-23 Thread Dmitriy Lyubimov
derstand > > the algo perfectly-- so this is a great heads up. Any advice oor warnings > > on hadoop installations and versions?? > > > > On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov > > wrote: > > > > > MapReduce things enter de-facto end

Re: Mahout on the cloud

2015-07-22 Thread Dmitriy Lyubimov
MapReduce things enter de-facto end-of-life. Not that we specifically don't want to support them, it is de-facto nobody bothers to support them -- especially risks are high with new versions of hadoop and EMR. That said, we'd be grateful for any guide about doing this in EMR. On Wed, Jul 22, 2015

Re: java.lang.OutOfMemoryError with Mahout 0.10 and Spark 1.1.1

2015-07-20 Thread Dmitriy Lyubimov
assuming task memory x number of cores does not exceed ~5g, and block cache manager ratio does not have some really weird setting, the next best thing to look at is initial task split size. I don' think in the release you are looking at the driver manages initial off-dfs splits satisfactorily (tha

Re: Problem Starting Spark Shell

2015-07-09 Thread Dmitriy Lyubimov
I don't know. seems like somebody is sitting on the port. `lsof` utility may help to figure what it is. On Wed, Jul 8, 2015 at 8:18 AM, Parimi Rohit wrote: > Hi Dimitry, > > Please find my answers inline. > > > On Tue, Jul 7, 2015 at 7:48 PM, Dmitriy Lyubimov > wrot

Re: RowSimilarity API -- illegal argument exception from org.apache.mahout.math.stats.LogLikelihood.logLikelihoodRatio()

2015-07-09 Thread Dmitriy Lyubimov
Travis, 0.10.x branch is for spark 1.2.x and master (0.11.0-snapshot) is for spark 1.3.x. my undersanding 0.11.0 should mostly work with exception for Spark shell, which is disabled on the HEAD. we are still woking on PR https://github.com/apache/mahout/pull/146 to re-enable it again. numNonZeroE

Re: Problem Starting Spark Shell

2015-07-07 Thread Dmitriy Lyubimov
this settings are for spark. spark shell only needs master (which is by default local), `MASTER` variable. Although. Your error indicates that it does try to go somewhere. are you able to run regular spark shell? in the head of 0.10.x branch you can specify additional spark properties in MAHOUT_O

Re: Mahout 0.10 with Spark 1.1.1

2015-07-07 Thread Dmitriy Lyubimov
attachments are not showing up on apache lists. On Tue, Jul 7, 2015 at 10:30 AM, Rodolfo Viana wrote: > Hi, > > I’m trying to run Mahout 0.10 using Spark 1.1.1 and so far I didn’t have > any success passing a file on hdfs. My actual problem is when I try to run > the example: > > bin/mahout spa

Re: Streaming K-means

2015-06-16 Thread Dmitriy Lyubimov
"streaming k-means" is something else afaik. Streaming k-means is reserved for a particular k-means method (in Mahout, at least, [1]). Whereas as far as i understand what mllib calls "streaming k-means" is name given by mllib contributor which really means "online k-means", i.e. radar tracking of

Re: populating and serializing large sparse matrices

2015-06-11 Thread Dmitriy Lyubimov
I guess you are talking DRM format (sequence file). current recommended way is to use mahout-samsara with e.g. Spark (no mapreduce support there). Translation of in-core matrix (sparse, for example) would take converting it to distributed matrix (DRM) first by means of drmParallelize [1] and then

Re: populating and serializing large sparse matrices

2015-06-11 Thread Dmitriy Lyubimov
correction: dfsWrite (typo) On Thu, Jun 11, 2015 at 3:53 PM, Dmitriy Lyubimov wrote: > I guess you are talking DRM format (sequence file). > > current recommended way is to use mahout-samsara with e.g. Spark (no > mapreduce support there). Translation of in-core matrix (sparse, f

Re: Runtime Interner Exception

2015-06-11 Thread Dmitriy Lyubimov
> > Also, are there some specific dependencies of versions? Should I wait for > the next release? > > > Thanks a lot and have a great day! > Mihai > > > On Jun 10, 2015, at 23:57, Dmitriy Lyubimov wrote: > > > > Hadoop has its own guava. This is some dependency clas

Re: Building Mahout Source

2015-06-11 Thread Dmitriy Lyubimov
I am not sure how maven repo is managed for released apache projects. Binary artifacts are available for downloads. Also if you are building from source, they would be found on standard places for a maven multimodule project, i.e. module-name/target/artifact-jar. On Jun 11, 2015 3:28 AM, "Raghuveer

Re: Runtime Interner Exception

2015-06-10 Thread Dmitriy Lyubimov
Hadoop has its own guava. This is some dependency clash at runtime, for sure. Other than that no idea. MR is being phased out. Why don't u try spark version in upcoming .10.2? On Jun 10, 2015 12:58 PM, "Mihai Dascalu" wrote: > Hi! > > After upgrading to Mahout 0.10.1, I have a runtime exception i

Re: word2vec in mahout.

2015-05-13 Thread Dmitriy Lyubimov
Spark's word2vec is pretty agile. On Wed, May 13, 2015 at 12:13 PM, David Starina wrote: > You can also check out the implementation in MLlib: > https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec > > > > On Wed, May 13, 2015 at 9:11 PM, Dan Dong wrote: > > > Thanks Andr

Re:Re: Re: Hadoop SSVD OutOfMemory Problem

2015-04-28 Thread Dmitriy Lyubimov
86 > null > > > Now, my question is, how can I run a specified test with maven? For "mvn > test" is so slow, then if I can do like "mvn test LocalSSVDPCASparseTest", > my efficiency will be improved. > > At 2015-04-29 01:25:34, "Dmitriy Lyubimov&

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
On Tue, Apr 28, 2015 at 1:14 PM, Mihai Dascalu wrote: > Indeed, it’s in local mode - but to setup hadoop on my Mac for the task at > hand did not seem necessary (the SVD uses a sparse matrix of 11MB). > oh. Then it is a wrong tool. try bidMat, I promise you won't be dissapointed. https://github

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
ecurity.UserGroupInformation - PrivilegedAction > as:mihaidascalu (auth:SIMPLE) > from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) > 1415709 [SwingWorker-pool-1-thread-1] ERROR > view.widgets.semanticModels.SemanticModelsTraining - Error procesing > config/LDA dir

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
9 [SwingWorker-pool-1-thread-1] DEBUG > org.apache.hadoop.security.UserGroupInformation - PrivilegedAction > as:mihaidascalu (auth:SIMPLE) > from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) > 1415709 [SwingWorker-pool-1-thread-1] ERROR > view.widgets.semanticModels.SemanticModelsTraining - Error procesing &

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
if your run time gets too high, try to start with low -k (like 10 or something) and -q=0, that will significantly reduce complexity of the problem. if this works, you need to find optimal levers that suit your hardware/input size/ runtime requirements. ( I can tell you right away that (k+p) value

Re: Re: Hadoop SSVD OutOfMemory Problem

2015-04-28 Thread Dmitriy Lyubimov
, clone (fork) apache/mahout in your account, (optionally) create a patch branch, commit your modifications there, and then use github UI to create a pull request against apache/mahout. thanks. -d On Mon, Apr 27, 2015 at 8:39 PM, lastarsenal wrote: > Hi, Dmitriy Lyubimov > > > OK, I ha

Re: Hadoop SSVD OutOfMemory Problem

2015-04-27 Thread Dmitriy Lyubimov
Thank you for this analysis. I can't immediately confirm this since it's been a while but this sounds credible. Do you mind to file a jira with all this information, and even perhaps do a PR on github? thank you. On Mon, Apr 27, 2015 at 4:32 AM, lastarsenal wrote: > Hi, All, > > > Recentl

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
t; > On Apr 3, 2015, at 12:22 PM, Dmitriy Lyubimov wrote: > > Although... i am not aware of one in A'A > > could be faulty vector length in a matrix if matrix was created by drmWrap > with explicit specification of ncol > > On Fri, Apr 3, 2015 at 12:20 PM, Dmitriy

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
Ah. yes i believe it is a bug in non-slim A'A similar to one I fixed for AB' some time ago. It makes error in computing parallelism and split ranges of the final product. On Fri, Apr 3, 2015 at 12:22 PM, Dmitriy Lyubimov wrote: > Although... i am not aware of one in A'A &

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
Although... i am not aware of one in A'A could be faulty vector length in a matrix if matrix was created by drmWrap with explicit specification of ncol On Fri, Apr 3, 2015 at 12:20 PM, Dmitriy Lyubimov wrote: > it's a bug. There's a number of similar ones in operator A&#x

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
it's a bug. There's a number of similar ones in operator A'B. On Fri, Apr 3, 2015 at 6:23 AM, Michael Kelly wrote: > Hi Pat, > > I've done some further digging and it looks like the problem is > occurring when the input files are split up to into parts. The input > to the item-similarity matrix

Re: Text clustering with SVD

2015-03-30 Thread Dmitriy Lyubimov
Note that these instructions actually mean running PCA, not SVD but that's probably the intention here. I don't think just running SVD helps. On Mon, Mar 30, 2015 at 1:04 AM, Suneel Marthi wrote: > Here are the steps if u r using Mahout-mrlegacy in the present Mahout > trunk: > > 1. Generate tfi

Re: Text clustering with SVD

2015-03-30 Thread Dmitriy Lyubimov
I am not aware of _any_ scenario under which lanczos would be faster (see N. Halko's dissertation for comparisons), although admittedly i did not study all possible cases. having -k=100 is probably enough for anything. I would not recommend running -q>0 for k>100 as it would become quite slow in

Re: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path

2015-01-27 Thread Dmitriy Lyubimov
This looks like hadoop or spark -specific thing (snappy codec is used by spark by default). There should be a way to disable this to a more palatable library but you will need to investigate it a little bit since i don't think anybody here knows mac specifics. Better yet is to figure how to instal

Re: spark-itemsimilarity: Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.event-handlers'

2015-01-27 Thread Dmitriy Lyubimov
spark 1.2 not supported (yet). current head runs on 1.1.0 (but i guess you can take a pull request #71 and compile it for 1.1.1 too, and perhaps even 1.2) On Tue, Jan 27, 2015 at 12:04 PM, Kevin Zhang < zhangyongji...@yahoo.com.invalid> wrote: > Hi, > > I'm new to Spark, Mahout. Just tried to run

RE: mahout 1.0 on EMR with spark item-similarity

2015-01-22 Thread Dmitriy Lyubimov
Oh, specifically to item similarity. Not sure. On Jan 22, 2015 8:42 AM, "Dmitriy Lyubimov" wrote: > There are some computations that are done in core in front end. This is > always method specific. Outside the method itself, there are no additional > requirements on top of

RE: mahout 1.0 on EMR with spark item-similarity

2015-01-22 Thread Dmitriy Lyubimov
There are some computations that are done in core in front end. This is always method specific. Outside the method itself, there are no additional requirements on top of spark requirements. However, since many ml methods tend to be more iterable than your regular etl stuff, expect also higher deman

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-09 Thread Dmitriy Lyubimov
strange. legacy still depends on m-math and should include it into job jar. or did it get that much out of hand after MR deprecation? On Fri, Jan 9, 2015 at 8:51 AM, mw wrote: > I found a solution! > I had to upload the missing jars onto yarn hdfs and add the following to > the hadoop Configurat

Re: Topological data analysis

2014-12-05 Thread Dmitriy Lyubimov
+1. I think contributions like this would count. On Thu, Dec 4, 2014 at 3:14 PM, Brian Dolan wrote: > Though I don't have an immediate use case, I'd +1 the idea! > > On Dec 4, 2014, at 3:11 PM, Andrew Musselman > wrote: > > > Any interest in a topological data analysis package in Mahout? > > >

Re: DBSCAN implementation in Mahout

2014-12-02 Thread Dmitriy Lyubimov
Correction. MR.SCAN is Univ. of Wisconsin's paper. Google Beijing was another paper on the subject but i found mr.scan having a bit more elegant simplicity in it. On Mon, Dec 1, 2014 at 12:41 PM, Dmitriy Lyubimov wrote: > if memory serves me, DeLiClu (density-link) is current best densi

Re: DBSCAN implementation in Mahout

2014-12-01 Thread Dmitriy Lyubimov
write MapReduce code for DBSCAN and OPTICS for > GSoC '15. > > I would like to take your input as to how much of significance would this > be of to the community in general? > > Thanks, > > Chirag Nagpal > University of Pune, India > www.chiragnagpal.com > __

  1   2   3   4   5   6   7   >