Re: Error spark-mahout when spark-submit mode cluster

2018-08-08 Thread Dmitriy Lyubimov
it is user-unsubscribe@m.a.o On Wed, Aug 8, 2018 at 6:47 AM, Eric Link wrote: > unsubscribe > > On Wed, Aug 1, 2018 at 8:54 AM Jaume Galí wrote: > > > Hi everybody, I'm trying to build a basic recomender with Spark and > Mahout > > on Scala. I use the follow mahout repo to compile mahout with

Re: Error spark-mahout when spark-submit mode cluster

2018-08-08 Thread Dmitriy Lyubimov
My best guess is that it looks like serialization problem at the cluster/master. This typically happens if class or java versions are different between driver/worker(s). Why that ended up being the case in your particular case, for me it is hard to tell. Bottom line, I do not believe this is a

Re: Hangouts

2018-07-31 Thread Dmitriy Lyubimov
I am on vacation this week fyi On Tue, Jul 31, 2018 at 11:36 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Cool, I'll shoot for something on Friday early Pacific time and put an > invite in here; looking forward to it! > > On Sat, Jul 28, 2018 at 9:26 AM Shannon Quinn wrote: > > >

Re: Congrats Palumbo and Holden

2018-05-02 Thread Dmitriy Lyubimov
Congrats! On Wed, May 2, 2018 at 1:25 PM, Trevor Grant wrote: > Both were just elected new ASF members!! > > https://s.apache.org/D6iz >

Re: Users of Scala 2.11

2018-04-27 Thread Dmitriy Lyubimov
correct address is : user-unsubscr...@mahout.apache.org On Thu, Apr 26, 2018 at 10:08 PM, Paul Crochet wrote: > unsubscribe > > 2018-04-24 21:08 GMT+03:00 Pat Ferrel : > > > Hi all, > > > > Mahout has hit a bit of a bump in releasing a Scala 2.11

Re: distributed cholesky on mahout

2018-04-19 Thread Dmitriy Lyubimov
no distributed Cholesky as far as i know. Thin QR or ssvd. On Wed, Apr 18, 2018 at 7:08 PM, QIFAN PU wrote: > Hi, > > I'm wondering if distributed cholesky decomposition on mahout is supported > now. > From this doc: >

Re: Updating Wikipedia

2018-02-19 Thread Dmitriy Lyubimov
I think Suneel was modifying it... On Sun, Feb 18, 2018 at 7:02 AM, Trevor Grant wrote: > Is anyone good at Wikipedia? > > We're still listed as being primarily running on Hadoop there. > > https://en.wikipedia.org/wiki/Apache_Mahout > > If anyone has some skills/time-

Re: Mahout and Spark 2.2 compatibility

2017-12-04 Thread Dmitriy Lyubimov
I can confirm i have not encounter fundamental issues with samsara (yet) while running with spark 2.2.0/scala 2.11.11 . it is mostly just adjusting the build to use proper versions of artifacts. On Mon, Dec 4, 2017 at 9:25 AM, Trevor Grant wrote: > Hi Marc, > >

Re: spark-itemsimilarity scalability / Spark parallelism issues (SimilarityAnalysis.cooccurrencesIDSs)

2017-08-23 Thread Dmitriy Lyubimov
there has been some work on optimizing in-memory assigns for vectors, but the matrix work for the in-memory java-backed assigns is admittedly more patchy at best, given the amount of variations. On Mon, Aug 21, 2017 at 12:05 PM, Pat Ferrel wrote: > Matt > > I’ll create a

Re: [DISCUSS] Naming convention for multiple spark/scala combos

2017-07-07 Thread Dmitriy Lyubimov
it would seem 2nd option is preferable if doable. Any option that has most desirable combinations prebuilt, is preferable i guess. Spark itself also releases tons of hadoop profile binary variations. so i don't have to build one myself. On Fri, Jul 7, 2017 at 8:57 AM, Trevor Grant

Re: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Dmitriy Lyubimov
so people need to make sure their PR merges to develop instead of master? Do they need to PR against develop branch, and if not, who is responsible for confict resolution then that is to arise from diffing and merging into different targets? On Tue, Jun 20, 2017 at 10:09 AM, Pat Ferrel

Re: Welcome New Committer Nikolay Sakharnykh

2017-05-01 Thread Dmitriy Lyubimov
Welcome!! On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykh wrote: > Hello everyone, > > I’m sorry for some delay with my introduction, have been swamped with > other projects recently ☺ > > Having worked at NVIDIA for around 8 years I have seen GPUs to evolve from >

Re: Samsara's learning curve

2017-03-29 Thread Dmitriy Lyubimov
optimization plan can actually be formed as A' if needed, as long as it doesn't meet the optimization barrier (i.e., collected or saved) On Wed, Mar 29, 2017 at 9:37 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel <p...@occams

Re: Samsara's learning curve

2017-03-29 Thread Dmitriy Lyubimov
On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote: > While I agree with D and T, I’ll add a few things to watch out for. > > One of the hardest things to learn is the new model of execution, it’s not > quite Spark or any other compute engine. You need to create contexts

Re: Samsara's learning curve

2017-03-29 Thread Dmitriy Lyubimov
On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote: > > The other missing bit is dataframes. R and Spark have them in different > forms but Mahout largely ignores the issue of real world object ids. Mahout only supports matrices and vectors, not data frames. Data frames

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-29 Thread Dmitriy Lyubimov
On Wed, Mar 29, 2017 at 9:10 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > Sorry, i think more commonly if aggregating transpose is to be used, then > cenroid assignments are better be the key of the matrix D (so D:= A) and > aggregating transpose is performed on a matrix (1

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-29 Thread Dmitriy Lyubimov
can finish up cluster assignment via M = (1 | D)' C = M(:,2:) with each row hadamard-divided by first row of counts M(:,1) (implying Golub-Van Loan notations for subblocking) On Wed, Mar 29, 2017 at 9:02 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > the simplest scheme is to i

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-29 Thread Dmitriy Lyubimov
the simplest scheme is to initialize distributed matrix of the shape D := (0 | A) where A is your dataset and 0 is a single column indicating current centroid assignment and distribute current centroid matrix C via matrix broadcast (assuming there are few enough centers). Then alternatively run

Re: Samsara's learning curve

2017-03-27 Thread Dmitriy Lyubimov
I believe writing in the DSL is simple enough, especially if you have some familiarity with Scala on top of R (or, in my case, R on top of Scala perhaps:). I've implemented about couple dozens customized algorithms that used distributed Samsara algebra at least to some degree, and I think I can

Re: Marketing

2017-03-24 Thread Dmitriy Lyubimov
On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel wrote: > The multiple backend support is such a waste of time IMO. The DSL and GPU > support is super important and should be made even more distributed. The > current (as I understand it) single threaded GPU per VM is only the

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-02-01 Thread Dmitriy Lyubimov
Isabel, if i understand it correctly, you are asking whether it makes sense add end2end scenarios based on Samsara to current codebase? The answer is, absolutely. Yes it does for both rather isolated issues (like computing clusters) and end-2-end scenarios. The only problem with end 2 end

Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation

2017-01-31 Thread Dmitriy Lyubimov
On Tue, Jan 31, 2017 at 3:01 AM, Isabel Drost-Fromm wrote: > > Hi, > > > To give some advise to downstream users in the field - what would be your > advise > for people tasked with concrete use cases (stuff like fraud detection, > anomaly > detection, learning search ranking

Re: Recommenders and MABs

2016-09-21 Thread Dmitriy Lyubimov
there's been a great blog on that somewhere on richrelevance blog... But i have a vague feeling based on what you are saying it may be all old news to you... [1] http://engineering.richrelevance.com/bandits-recommendation-systems/ and there's more in the series On Sat, Sep 17, 2016 at 3:10 PM,

Re: Text clustering how to?

2016-07-27 Thread Dmitriy Lyubimov
I think you have got a reply via jira. On Wed, Jul 27, 2016 at 10:50 AM, Raviteja Lokineni < raviteja.lokin...@gmail.com> wrote: > Anybody? > > On Thu, Jul 21, 2016 at 10:42 AM, Raviteja Lokineni < > raviteja.lokin...@gmail.com> wrote: > > > Hi all, > > > > I am pretty new to Apache Mahout. I am

Re: mahout tf-idf vs lucene tf-idf

2016-06-06 Thread Dmitriy Lyubimov
to add to Ted's reply, mahout has traditionally offered a bigram/trigram analysis as a part of its tf-idf conversion (a step away from the bag of words model so that directional statistically stable combinations of 2 or 3 words are reduced to their own term). However, this has not been ported to

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
I am just going to give you some design intents in the existing code. as far as i can recollect, mahout context gives complete flexibility. You can control the behavior but various degrees of overriding the default behavior and doing more or less work on context setup on your own. (I assume we

Re: Clustering options

2016-05-23 Thread Dmitriy Lyubimov
Xavier, there are no exact equivalents in public domain to algorithms existed for MR clustering as of yet. My understanding some of them are on the roadmap though. depending on the level of sophistication you require, some of them are very easy to build though. On Sat, May 21, 2016 at 8:46 PM,

Re: RowSimilakrity : NotSerializableException

2016-05-07 Thread Dmitriy Lyubimov
you can also wrap mahout context around existing spark session (aka context). On Sat, May 7, 2016 at 9:41 PM, Rohit Jain wrote: > Yes, we did figure out this problem. And realised that instead sparkcontext > I have to use mahoutsparkcontext, > > On Sun, May 8, 2016 at

Re: Matrix inversion

2016-05-05 Thread Dmitriy Lyubimov
:50 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > The mantra i keep hearing is that if someone needs matrix inversion then > he/she must be doing something wrong. Not sure how true that is, but in all > cases i have encountered, people try to avoid matrix inversion one way or &g

Re: Matrix inversion

2016-05-05 Thread Dmitriy Lyubimov
The mantra i keep hearing is that if someone needs matrix inversion then he/she must be doing something wrong. Not sure how true that is, but in all cases i have encountered, people try to avoid matrix inversion one way or another. Re: libraries: Mahout is more about apis now than any particular

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, (1) to be clear, the ASF trademark and branding policy is not to endorse views of the 3rd party publications and to ask 3rd party writers to do a disclosure that their views are not endorsed by ASF project. To that end, ASF project can't really tell you that some publication is

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, if you are using any Mahout Mapreduce algorithm for research, please make sure to make this disclosure: all Mahout MapReduce algorithms are officially not supported and deprecated since February, 2014 (IIRC). I can dig up a specific issue regarding this. There also has been an

Re: spark.shuffle.memoryFraction

2016-04-20 Thread Dmitriy Lyubimov
i think in spark 1.6 this really became more flexible in terms of only specifying max/min thresholds. Yes shuffle spills in spark during multiplication are humongous, i tried a few hacks but that's spark. that's one of known bottlenecks unfortunately. You are welcome to try and hack A'B too. My

Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Dmitriy Lyubimov
park example into the Java source code so that we > do not disrupt the overall flow? > > > Have a great evening! > Mihai > > > On 21 Mar 2016, at 19:31, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > > my 1 cents (since it is less than 2) is MAHOUT_L

Re: Removing MAHOUT_LOCAL option

2016-03-21 Thread Dmitriy Lyubimov
my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy packaging. as long as MR is still here (and I would say it needs to be still here, unless it falls in complete disrepair and totally out of sync with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as MR

Re: New Mahout "Samsara" Book

2016-02-25 Thread Dmitriy Lyubimov
For the purposes of this book (and otherwise too, as far as i know) "Samsara" is a release code name, defined as 0.10 and after. That includes all new code that happened after that, and the code that is still not deprectated (although most of MapReduce code is, by now, as evidenced by

Re: New Mahout "Samsara" Book

2016-02-25 Thread Dmitriy Lyubimov
; > > I checked both links, they have only front and back cover of the book. No > > table of contents > > On Feb 25, 2016 9:57 AM, "Suneel Marthi" <smar...@apache.org> wrote: > > > >> You can see the TOC on Amazon > >> > >> > >

Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

2016-02-16 Thread Dmitriy Lyubimov
BTW, depending on the resource manager, 10G per executor may not necessarily be a sufficient number. I never plan less than 1.5G per core (after excluding block manager, or 3Gb per core including block manager). That means that 10G executor memory might be barely enough for 4-core worker nodes. So

Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

2016-02-16 Thread Dmitriy Lyubimov
bottom line increase executor's non-mem-block memory and reduce indivdiual starting task size until it all fits. On Tue, Feb 16, 2016 at 4:09 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > the original exception definitely happens in the task when mahout tries to > build an e

Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

2016-02-16 Thread Dmitriy Lyubimov
the original exception definitely happens in the task when mahout tries to build an entire matrix block out of a partition. Use more tasks, smaller in size initially. using par(min=??) will help to repartition to at least ?? tasks. off-hdfs defaults are just too big for matrix processing. Not sure

Re: Mahout - problem importing to Eclipse

2016-02-08 Thread Dmitriy Lyubimov
ing to be completely OK, so we can just leave it at that. > > Best regards, > David > > On Mon, Feb 1, 2016 at 11:52 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > the user list will not let attachments thru. > > > > On Sun, Jan 31, 2016 at 1

Re: Confusion regarding Samsara's configuration

2016-02-02 Thread Dmitriy Lyubimov
5 and report. > > Thank you very much again, > > Kind Regards, > Bahaa > > > On Tue, Feb 2, 2016 at 12:01 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > Bahaa, first off, i don't think we have certified any of releases to run > > with spar

Re: Confusion regarding Samsara's configuration

2016-02-02 Thread Dmitriy Lyubimov
itself. make sure to observe transitive dependency rules for the front end. On Tue, Feb 2, 2016 at 12:53 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > this is strange. if you took over the context, added jars manually and it > still does not work, there's something wrong with s

Re: Confusion regarding Samsara's configuration

2016-02-02 Thread Dmitriy Lyubimov
Bahaa, first off, i don't think we have certified any of releases to run with spar 1.6 (yet). I think spark 1.5 is the last known release to run with 0.11 series. Second, if you use mahoutSparkContext() method to create context, it would look for MAHOUT_HOME setup to add mahout binaries to the

Re: Mahout - problem importing to Eclipse

2016-02-01 Thread Dmitriy Lyubimov
the user list will not let attachments thru. On Sun, Jan 31, 2016 at 11:59 PM, David Starina wrote: > Hi, > > I have problem importing the project to Eclipse - I get the error "Could > not update project mahout-mr configuration". Attaching the error as image. > Anyone

Re: Some test results

2015-12-30 Thread Dmitriy Lyubimov
Nice! On Dec 30, 2015 11:51 AM, "Pat Ferrel" wrote: > As many of you know Mahout-Samsara includes an interesting and important > extension to cooccurrence similarity, which supports cross-coossurrence and > log-likelihood downsampling. This, when combined with a search

Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Dmitriy Lyubimov
argh bummer. On Fri, Nov 6, 2015 at 4:01 PM, Suneel Marthi wrote: > Thanks. We have 3 +1 votes and no -1s. > > This release has passed and the Voting is officially closed, will send an > announcement out when the release has been finalized. > > Thanks again. > > On Fri, Nov

Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Dmitriy Lyubimov
hm. I did not find the staging repo. is it gone already? One thing, if i may whine (I already asked for it last time): Can we please publish -tests artifacts, please pretty please? it is so much easier if derived applications could re-use mahout testing framework. On Fri, Nov 6, 2015 at 2:57

Re: [VOTE] Apache Mahout 0.11.1 Release Candidate

2015-11-06 Thread Dmitriy Lyubimov
uring summer on one of the branches (most likely > 0.10.x). No ? > > On Fri, Nov 6, 2015 at 7:05 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > hm. I did not find the staging repo. is it gone already? > > > > One thing, if i may whine (I already asked

Re: Is Mahout obsolete now?

2015-10-20 Thread Dmitriy Lyubimov
Pavan, I guess part of the documentation difficulty is in that Mahout Samsara environment is only used for "training" but external components are used for "scoring". So it is not 100% end-to-end Mahout solution to document. Pat, it would be nice though to put some of your docs on to Mahout site

Re: Is Mahout obsolete now?

2015-10-19 Thread Dmitriy Lyubimov
On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel wrote: > Even have code running using the PredicitonIO framework. This includesa > SDK to event store to realtime query. Loosely speaking a lambda > architecture. Most of the whole enchilada running except the content part > of

Re: Is Mahout obsolete now?

2015-10-19 Thread Dmitriy Lyubimov
gt; solution ? Specifically, for matrix multiplication and > factorization. thanks, canal > > > On Tuesday, October 20, 2015 6:37 AM, Dmitriy Lyubimov < > dlie...@gmail.com> wrote: > > > On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel <p...@occamsmachete.com

Re: matrix inversion in plan ?

2015-10-08 Thread Dmitriy Lyubimov
or pseudoinverse really, i guess On Thu, Oct 8, 2015 at 3:58 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > Mahout translation (approximation, since ssvd is reduced-rank, not the > true thing): > > val (drmU, drmV, s) = dssvd(drmA, k = 100) > val drmInvA = drmV %*% diag

Re: matrix inversion in plan ?

2015-10-08 Thread Dmitriy Lyubimov
Mahout translation (approximation, since ssvd is reduced-rank, not the true thing): val (drmU, drmV, s) = dssvd(drmA, k = 100) val drmInvA = drmV %*% diagv(1 /=: s) %*% drmU.t Still, technically, it is a right inverse as in reality m is rarely the same as n. Also, k must be k<= drmA.nrow min

Re: Exception in thread "main" java.lang.IllegalArgumentException: Unable to read output from "mahout -spark classpath"

2015-10-06 Thread Dmitriy Lyubimov
DRM format is compatible on persistence level with Mahout MapReduce algorithms. It is a Hadoop sequence file. The key is unique, can be one of -- unique ordinal IntWriteable, treated as a row number (i.e. nrow=max(int key)), or -- Text, LongWritable, BytesWritable, or .. forget what else. This

Re: Exception in thread "main" java.lang.IllegalArgumentException: Unable to read output from "mahout -spark classpath"

2015-10-06 Thread Dmitriy Lyubimov
:) strictly speaking out of core is anything that is not in memory, e.g. sequential algorithms are generally also considered out-of-core btw i though 0.11.x was for 1.3? or that was re-certified for 1.4 too? On Tue, Oct 6, 2015 at 1:09 PM, Pat Ferrel wrote: > Linear

Re: sq_dist()

2015-09-10 Thread Dmitriy Lyubimov
: > I already use breeze, actually my current impl of sqDist uses it: > > https://github.com/danielkorzekwa/bayes-scala-gp/blob/master/src/main/scala/dk/gp/math/sqDist.scala > > still 3 times slower that sq_dist from gpml > > thanks for BID Data Project info > > On 9

Re: Time Series Stuff

2015-08-14 Thread Dmitriy Lyubimov
Not that I know of. would be nice to have. On Fri, Aug 14, 2015 at 4:42 PM, Nick Kolegraff nickkolegr...@gmail.com wrote: Hey Mahouts, Looking for some time series analysis stuff I can use in mahout. I don't see much, other than this legacy HMM stuff.

Re: Matrix inverse

2015-08-09 Thread Dmitriy Lyubimov
Do you mean in core matrix inversion? It is supported via solve. Actually it is supported both in Java and Scala. On Aug 5, 2015 9:11 PM, go canal goca...@yahoo.com.invalid wrote: Hello,I am new to Mahout. Would appreciate if someone could tell me if matrix inverse is still supported in the

Re: Setup questions for mahout with spark

2015-07-27 Thread Dmitriy Lyubimov
(1) all i ever used with spark is Oracle jvm. (2) take the head of either master or 0.10.x branch. the heads there are some ~30-odd bug fix issues apart from 0.10.1 release, we really should've released 0.10.2 and 0.11.0 by now but i guess end of summer is a slow season. (3) If you want to use

Re: Mahout on the cloud

2015-07-23 Thread Dmitriy Lyubimov
?? On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: MapReduce things enter de-facto end-of-life. Not that we specifically don't want to support them, it is de-facto nobody bothers to support them -- especially risks are high with new versions of hadoop

Re: Mahout on the cloud

2015-07-23 Thread Dmitriy Lyubimov
PPS. one of better backends, if there any comparison really is appropriate, is expected to be Apache Flink. On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: i guess i was a bit vague. by quasi-agnostic i mean that some code, the smaller part of it, may include

Re: Mahout on the cloud

2015-07-22 Thread Dmitriy Lyubimov
MapReduce things enter de-facto end-of-life. Not that we specifically don't want to support them, it is de-facto nobody bothers to support them -- especially risks are high with new versions of hadoop and EMR. That said, we'd be grateful for any guide about doing this in EMR. On Wed, Jul 22,

Re: java.lang.OutOfMemoryError with Mahout 0.10 and Spark 1.1.1

2015-07-20 Thread Dmitriy Lyubimov
assuming task memory x number of cores does not exceed ~5g, and block cache manager ratio does not have some really weird setting, the next best thing to look at is initial task split size. I don' think in the release you are looking at the driver manages initial off-dfs splits satisfactorily

Re: RowSimilarity API -- illegal argument exception from org.apache.mahout.math.stats.LogLikelihood.logLikelihoodRatio()

2015-07-09 Thread Dmitriy Lyubimov
Travis, 0.10.x branch is for spark 1.2.x and master (0.11.0-snapshot) is for spark 1.3.x. my undersanding 0.11.0 should mostly work with exception for Spark shell, which is disabled on the HEAD. we are still woking on PR https://github.com/apache/mahout/pull/146 to re-enable it again.

Re: Mahout 0.10 with Spark 1.1.1

2015-07-07 Thread Dmitriy Lyubimov
attachments are not showing up on apache lists. On Tue, Jul 7, 2015 at 10:30 AM, Rodolfo Viana rodolfodelimavi...@gmail.com wrote: Hi, I’m trying to run Mahout 0.10 using Spark 1.1.1 and so far I didn’t have any success passing a file on hdfs. My actual problem is when I try to run the

Re: Problem Starting Spark Shell

2015-07-07 Thread Dmitriy Lyubimov
this settings are for spark. spark shell only needs master (which is by default local), `MASTER` variable. Although. Your error indicates that it does try to go somewhere. are you able to run regular spark shell? in the head of 0.10.x branch you can specify additional spark properties in

Re: Streaming K-means

2015-06-16 Thread Dmitriy Lyubimov
streaming k-means is something else afaik. Streaming k-means is reserved for a particular k-means method (in Mahout, at least, [1]). Whereas as far as i understand what mllib calls streaming k-means is name given by mllib contributor which really means online k-means, i.e. radar tracking of

Re: Building Mahout Source

2015-06-11 Thread Dmitriy Lyubimov
I am not sure how maven repo is managed for released apache projects. Binary artifacts are available for downloads. Also if you are building from source, they would be found on standard places for a maven multimodule project, i.e. module-name/target/artifact-jar. On Jun 11, 2015 3:28 AM, Raghuveer

Re: Runtime Interner Exception

2015-06-11 Thread Dmitriy Lyubimov
specific dependencies of versions? Should I wait for the next release? Thanks a lot and have a great day! Mihai On Jun 10, 2015, at 23:57, Dmitriy Lyubimov dlie...@gmail.com wrote: Hadoop has its own guava. This is some dependency clash at runtime, for sure. Other than that no idea. MR

Re: populating and serializing large sparse matrices

2015-06-11 Thread Dmitriy Lyubimov
correction: dfsWrite (typo) On Thu, Jun 11, 2015 at 3:53 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: I guess you are talking DRM format (sequence file). current recommended way is to use mahout-samsara with e.g. Spark (no mapreduce support there). Translation of in-core matrix (sparse

Re: populating and serializing large sparse matrices

2015-06-11 Thread Dmitriy Lyubimov
I guess you are talking DRM format (sequence file). current recommended way is to use mahout-samsara with e.g. Spark (no mapreduce support there). Translation of in-core matrix (sparse, for example) would take converting it to distributed matrix (DRM) first by means of drmParallelize [1] and then

Re: Runtime Interner Exception

2015-06-10 Thread Dmitriy Lyubimov
Hadoop has its own guava. This is some dependency clash at runtime, for sure. Other than that no idea. MR is being phased out. Why don't u try spark version in upcoming .10.2? On Jun 10, 2015 12:58 PM, Mihai Dascalu mihai.dasc...@cs.pub.ro wrote: Hi! After upgrading to Mahout 0.10.1, I have a

Re: word2vec in mahout.

2015-05-13 Thread Dmitriy Lyubimov
Spark's word2vec is pretty agile. On Wed, May 13, 2015 at 12:13 PM, David Starina david.star...@gmail.com wrote: You can also check out the implementation in MLlib: https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec On Wed, May 13, 2015 at 9:11 PM, Dan Dong

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
, at 10:32, Dmitriy Lyubimov dlie...@gmail.com mailto: dlie...@gmail.com wrote: if your run time gets too high, try to start with low -k (like 10 or something) and -q=0, that will significantly reduce complexity of the problem. if this works, you need to find optimal levers that suit your

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
On Tue, Apr 28, 2015 at 1:14 PM, Mihai Dascalu mihai.dasc...@cs.pub.ro wrote: Indeed, it’s in local mode - but to setup hadoop on my Mac for the task at hand did not seem necessary (the SVD uses a sparse matrix of 11MB). oh. Then it is a wrong tool. try bidMat, I promise you won't be

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
if your run time gets too high, try to start with low -k (like 10 or something) and -q=0, that will significantly reduce complexity of the problem. if this works, you need to find optimal levers that suit your hardware/input size/ runtime requirements. ( I can tell you right away that (k+p) value

Re: Re: Hadoop SSVD OutOfMemory Problem

2015-04-28 Thread Dmitriy Lyubimov
, clone (fork) apache/mahout in your account, (optionally) create a patch branch, commit your modifications there, and then use github UI to create a pull request against apache/mahout. thanks. -d On Mon, Apr 27, 2015 at 8:39 PM, lastarsenal lastarse...@163.com wrote: Hi, Dmitriy Lyubimov OK, I

Re: Problems running SSVD directly from Java

2015-04-28 Thread Dmitriy Lyubimov
unsuccessful. On 28 Apr 2015, at 10:32, Dmitriy Lyubimov dlie...@gmail.com mailto: dlie...@gmail.com wrote: if your run time gets too high, try to start with low -k (like 10 or something) and -q=0, that will significantly reduce complexity of the problem. if this works, you need

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
it's a bug. There's a number of similar ones in operator A'B. On Fri, Apr 3, 2015 at 6:23 AM, Michael Kelly mich...@onespot.com wrote: Hi Pat, I've done some further digging and it looks like the problem is occurring when the input files are split up to into parts. The input to the

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
Although... i am not aware of one in A'A could be faulty vector length in a matrix if matrix was created by drmWrap with explicit specification of ncol On Fri, Apr 3, 2015 at 12:20 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: it's a bug. There's a number of similar ones in operator A'B

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
Ah. yes i believe it is a bug in non-slim A'A similar to one I fixed for AB' some time ago. It makes error in computing parallelism and split ranges of the final product. On Fri, Apr 3, 2015 at 12:22 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Although... i am not aware of one in A'A could

Re: spark-itemsimilarity IndexException - outside allowable range

2015-04-03 Thread Dmitriy Lyubimov
? On Apr 3, 2015, at 12:22 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Although... i am not aware of one in A'A could be faulty vector length in a matrix if matrix was created by drmWrap with explicit specification of ncol On Fri, Apr 3, 2015 at 12:20 PM, Dmitriy Lyubimov dlie

Re: Text clustering with SVD

2015-03-30 Thread Dmitriy Lyubimov
I am not aware of _any_ scenario under which lanczos would be faster (see N. Halko's dissertation for comparisons), although admittedly i did not study all possible cases. having -k=100 is probably enough for anything. I would not recommend running -q0 for k100 as it would become quite slow in

Re: Text clustering with SVD

2015-03-30 Thread Dmitriy Lyubimov
Note that these instructions actually mean running PCA, not SVD but that's probably the intention here. I don't think just running SVD helps. On Mon, Mar 30, 2015 at 1:04 AM, Suneel Marthi suneel.mar...@gmail.com wrote: Here are the steps if u r using Mahout-mrlegacy in the present Mahout

Re: spark-itemsimilarity: Exception in thread main com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.event-handlers'

2015-01-27 Thread Dmitriy Lyubimov
spark 1.2 not supported (yet). current head runs on 1.1.0 (but i guess you can take a pull request #71 and compile it for 1.1.1 too, and perhaps even 1.2) On Tue, Jan 27, 2015 at 12:04 PM, Kevin Zhang zhangyongji...@yahoo.com.invalid wrote: Hi, I'm new to Spark, Mahout. Just tried to run the

Re: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path

2015-01-27 Thread Dmitriy Lyubimov
This looks like hadoop or spark -specific thing (snappy codec is used by spark by default). There should be a way to disable this to a more palatable library but you will need to investigate it a little bit since i don't think anybody here knows mac specifics. Better yet is to figure how to

RE: mahout 1.0 on EMR with spark item-similarity

2015-01-22 Thread Dmitriy Lyubimov
Oh, specifically to item similarity. Not sure. On Jan 22, 2015 8:42 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: There are some computations that are done in core in front end. This is always method specific. Outside the method itself, there are no additional requirements on top of spark

RE: mahout 1.0 on EMR with spark item-similarity

2015-01-22 Thread Dmitriy Lyubimov
There are some computations that are done in core in front end. This is always method specific. Outside the method itself, there are no additional requirements on top of spark requirements. However, since many ml methods tend to be more iterable than your regular etl stuff, expect also higher

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

2015-01-09 Thread Dmitriy Lyubimov
strange. legacy still depends on m-math and should include it into job jar. or did it get that much out of hand after MR deprecation? On Fri, Jan 9, 2015 at 8:51 AM, mw m...@plista.com wrote: I found a solution! I had to upload the missing jars onto yarn hdfs and add the following to the

Re: Topological data analysis

2014-12-05 Thread Dmitriy Lyubimov
+1. I think contributions like this would count. On Thu, Dec 4, 2014 at 3:14 PM, Brian Dolan buddha...@gmail.com wrote: Though I don't have an immediate use case, I'd +1 the idea! On Dec 4, 2014, at 3:11 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Any interest in a topological

Re: DBSCAN implementation in Mahout

2014-12-02 Thread Dmitriy Lyubimov
Correction. MR.SCAN is Univ. of Wisconsin's paper. Google Beijing was another paper on the subject but i found mr.scan having a bit more elegant simplicity in it. On Mon, Dec 1, 2014 at 12:41 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: if memory serves me, DeLiClu (density-link) is current

Re: DBSCAN implementation in Mahout

2014-12-01 Thread Dmitriy Lyubimov
'15. I would like to take your input as to how much of significance would this be of to the community in general? Thanks, Chirag Nagpal University of Pune, India www.chiragnagpal.com From: Dmitriy Lyubimov dlie...@gmail.com Sent: Saturday, November

Re: DBSCAN implementation in Mahout

2014-11-29 Thread Dmitriy Lyubimov
No there is no dbscan, optics or any other density flavor afaik Sent from my phone. On Nov 28, 2014 11:41 AM, 3316 Chirag Nagpal chiragnagpal_12...@aitpune.edu.in wrote: ? Hello I am Chirag Nagpal, a third year student of Computer Engineering at the University of Pune, India and currently

Re: NaN produced by SSVD ?

2014-11-03 Thread Dmitriy Lyubimov
be much smaller than N, that could be the reason. but it is a bit difficult to figure out that R beforehand. thanks Yang On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: is the matrix by any chance constructed so that it may have rank k? I think MR code

Re: NaN produced by SSVD ?

2014-10-31 Thread Dmitriy Lyubimov
...@gmail.com wrote: i am talking about the MR one. thanks yang On Oct 30, 2014 8:16 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: This is not a known problem... there are few ssvd here, sequential, MR and spark one. for the record, which one are you running? On Thu, Oct 30

Re: NaN produced by SSVD ?

2014-10-30 Thread Dmitriy Lyubimov
This is not a known problem... there are few ssvd here, sequential, MR and spark one. for the record, which one are you running? On Thu, Oct 30, 2014 at 4:37 PM, Yang tedd...@gmail.com wrote: we are running ssvd on a dataset (this one is relatively small, with 8000 rows, number of

Re: Mahout Vs Spark

2014-10-22 Thread Dmitriy Lyubimov
For the record, this is all false dilemma (at least w.r.t. spark vs mahout spark bindings). The spark bindings have never been concieved as one vs another. Mahout scala bindings is on-top add-on to spark that just happens to rely on some of things in mahout-math. With spark one gets some major

Re: Upgrade to Spark 1.1.0?

2014-10-21 Thread Dmitriy Lyubimov
? That would be an easy way to test this theory. either of these could cause missing classes. On Oct 21, 2014, at 9:52 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: no i havent used it with anything but 1.0.1 and 0.9.x . on a side note, I just have changed my employer. It is one of these big guys

Re: Upgrade to Spark 1.1.0?

2014-10-21 Thread Dmitriy Lyubimov
think you need to delete is anyway. On Oct 21, 2014, at 12:27 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: fwiw i never built spark using maven. Always use sbt assembly. On Tue, Oct 21, 2014 at 11:55 AM, Pat Ferrel p...@occamsmachete.com wrote: Ok, the mystery is solved. The safe

  1   2   3   4   5   6   >