so does mine. 9-10 am PST?
On Mon, Sep 3, 2018 at 12:10 PM Ivan Serdyuk
wrote:
> Google calendar reports "Could not find the requested".
>
>
>
> On Mon, Sep 3, 2018 at 8:46 PM Andrew Palumbo wrote:
>
> > Probably my calendar messed it up.
> > Thx
> > --andy
> >
> > On Sep 3, 2018 10:32 AM, Andr
it is user-unsubscribe@m.a.o
On Wed, Aug 8, 2018 at 6:47 AM, Eric Link wrote:
> unsubscribe
>
> On Wed, Aug 1, 2018 at 8:54 AM Jaume Galí wrote:
>
> > Hi everybody, I'm trying to build a basic recomender with Spark and
> Mahout
> > on Scala. I use the follow mahout repo to compile mahout with s
My best guess is that it looks like serialization problem at the
cluster/master. This typically happens if class or java versions are
different between driver/worker(s). Why that ended up being the case in
your particular case, for me it is hard to tell. Bottom line, I do not
believe this is a Maho
I am on vacation this week fyi
On Tue, Jul 31, 2018 at 11:36 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:
> Cool, I'll shoot for something on Friday early Pacific time and put an
> invite in here; looking forward to it!
>
> On Sat, Jul 28, 2018 at 9:26 AM Shannon Quinn wrote:
>
> >
Congrats!
On Wed, May 2, 2018 at 1:25 PM, Trevor Grant
wrote:
> Both were just elected new ASF members!!
>
> https://s.apache.org/D6iz
>
correct address is : user-unsubscr...@mahout.apache.org
On Thu, Apr 26, 2018 at 10:08 PM, Paul Crochet
wrote:
> unsubscribe
>
> 2018-04-24 21:08 GMT+03:00 Pat Ferrel :
>
> > Hi all,
> >
> > Mahout has hit a bit of a bump in releasing a Scala 2.11 version. I was
> > able to build 0.13.0 for Scala
no distributed Cholesky as far as i know.
Thin QR or ssvd.
On Wed, Apr 18, 2018 at 7:08 PM, QIFAN PU wrote:
> Hi,
>
> I'm wondering if distributed cholesky decomposition on mahout is supported
> now.
> From this doc:
> https://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
> It see
I think Suneel was modifying it...
On Sun, Feb 18, 2018 at 7:02 AM, Trevor Grant
wrote:
> Is anyone good at Wikipedia?
>
> We're still listed as being primarily running on Hadoop there.
>
> https://en.wikipedia.org/wiki/Apache_Mahout
>
> If anyone has some skills/time- an update would be cool...
I can confirm i have not encounter fundamental issues with samsara (yet)
while running with spark 2.2.0/scala 2.11.11 . it is mostly just adjusting
the build to use proper versions of artifacts.
On Mon, Dec 4, 2017 at 9:25 AM, Trevor Grant
wrote:
> Hi Marc,
>
> Actually, it's not THAT hard to ge
there has been some work on optimizing in-memory assigns for vectors, but
the matrix work for the in-memory java-backed assigns is admittedly more
patchy at best, given the amount of variations.
On Mon, Aug 21, 2017 at 12:05 PM, Pat Ferrel wrote:
> Matt
>
> I’ll create a feature branch of Mahout
it would seem 2nd option is preferable if doable. Any option that has most
desirable combinations prebuilt, is preferable i guess. Spark itself also
releases tons of hadoop profile binary variations. so i don't have to build
one myself.
On Fri, Jul 7, 2017 at 8:57 AM, Trevor Grant
wrote:
> Hey a
so people need to make sure their PR merges to develop instead of master?
Do they need to PR against develop branch, and if not, who is responsible
for confict resolution then that is to arise from diffing and merging into
different targets?
On Tue, Jun 20, 2017 at 10:09 AM, Pat Ferrel wrote:
>
Welcome!!
On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykh
wrote:
> Hello everyone,
>
> I’m sorry for some delay with my introduction, have been swamped with
> other projects recently ☺
>
> Having worked at NVIDIA for around 8 years I have seen GPUs to evolve from
> specialized graphics proce
matrix A inside
optimization plan can actually be formed as A' if needed, as long as it
doesn't meet the optimization barrier (i.e., collected or saved)
On Wed, Mar 29, 2017 at 9:37 AM, Dmitriy Lyubimov wrote:
>
>
> On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote:
>
>>
On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote:
> While I agree with D and T, I’ll add a few things to watch out for.
>
> One of the hardest things to learn is the new model of execution, it’s not
> quite Spark or any other compute engine. You need to create contexts that
> have virtualized th
On Wed, Mar 29, 2017 at 9:26 AM, Pat Ferrel wrote:
>
> The other missing bit is dataframes. R and Spark have them in different
> forms but Mahout largely ignores the issue of real world object ids.
Mahout only supports matrices and vectors, not data frames.
Data frames imply mix of various typ
On Wed, Mar 29, 2017 at 9:10 AM, Dmitriy Lyubimov wrote:
> Sorry, i think more commonly if aggregating transpose is to be used, then
> cenroid assignments are better be the key of the matrix D (so D:= A) and
> aggregating transpose is performed on a matrix (1 | D)' (i.e., 1 cb
nd
we can finish up cluster assignment via
M = (1 | D)'
C = M(:,2:) with each row hadamard-divided by first row of counts M(:,1)
(implying Golub-Van Loan notations for subblocking)
On Wed, Mar 29, 2017 at 9:02 AM, Dmitriy Lyubimov wrote:
> the simplest scheme is to initialize distributed
the simplest scheme is to initialize distributed matrix of the shape D :=
(0 | A) where A is your dataset and 0 is a single column indicating current
centroid assignment and distribute current centroid matrix C via matrix
broadcast (assuming there are few enough centers).
Then alternatively run cl
I believe writing in the DSL is simple enough, especially if you have some
familiarity with Scala on top of R (or, in my case, R on top of Scala
perhaps:). I've implemented about couple dozens customized algorithms that
used distributed Samsara algebra at least to some degree, and I think I can
rel
On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel wrote:
> The multiple backend support is such a waste of time IMO. The DSL and GPU
> support is super important and should be made even more distributed. The
> current (as I understand it) single threaded GPU per VM is only the first
> step in what will
Isabel, if i understand it correctly, you are asking whether it makes sense
add end2end scenarios based on Samsara to current codebase?
The answer is, absolutely. Yes it does for both rather isolated issues
(like computing clusters) and end-2-end scenarios.
The only problem with end 2 end scenari
On Tue, Jan 31, 2017 at 3:01 AM, Isabel Drost-Fromm
wrote:
>
> Hi,
>
>
> To give some advise to downstream users in the field - what would be your
> advise
> for people tasked with concrete use cases (stuff like fraud detection,
> anomaly
> detection, learning search ranking functions, building a
there's been a great blog on that somewhere on richrelevance blog... But i
have a vague feeling based on what you are saying it may be all old news to
you...
[1] http://engineering.richrelevance.com/bandits-recommendation-systems/
and there's more in the series
On Sat, Sep 17, 2016 at 3:10 PM, Pa
I think you have got a reply via jira.
On Wed, Jul 27, 2016 at 10:50 AM, Raviteja Lokineni <
raviteja.lokin...@gmail.com> wrote:
> Anybody?
>
> On Thu, Jul 21, 2016 at 10:42 AM, Raviteja Lokineni <
> raviteja.lokin...@gmail.com> wrote:
>
> > Hi all,
> >
> > I am pretty new to Apache Mahout. I am
to add to Ted's reply, mahout has traditionally offered a bigram/trigram
analysis as a part of its tf-idf conversion (a step away from the bag of
words model so that directional statistically stable combinations of 2 or 3
words are reduced to their own term). However, this has not been ported to
sp
I am just going to give you some design intents in the existing code.
as far as i can recollect, mahout context gives complete flexibility. You
can control the behavior but various degrees of overriding the default
behavior and doing more or less work on context setup on your own. (I
assume we are
Xavier,
there are no exact equivalents in public domain to algorithms existed for
MR clustering as of yet. My understanding some of them are on the roadmap
though.
depending on the level of sophistication you require, some of them are very
easy to build though.
On Sat, May 21, 2016 at 8:46 PM, FR
you can also wrap mahout context around existing spark session (aka
context).
On Sat, May 7, 2016 at 9:41 PM, Rohit Jain wrote:
> Yes, we did figure out this problem. And realised that instead sparkcontext
> I have to use mahoutsparkcontext,
>
> On Sun, May 8, 2016 at 4:26 AM, Pat Ferrel wrote:
at 1:50 PM, Dmitriy Lyubimov wrote:
> The mantra i keep hearing is that if someone needs matrix inversion then
> he/she must be doing something wrong. Not sure how true that is, but in all
> cases i have encountered, people try to avoid matrix inversion one way or
> another.
>
>
The mantra i keep hearing is that if someone needs matrix inversion then
he/she must be doing something wrong. Not sure how true that is, but in all
cases i have encountered, people try to avoid matrix inversion one way or
another.
Re: libraries: Mahout is more about apis now than any particular i
Prakash,
(1) to be clear, the ASF trademark and branding policy is not to endorse
views of the 3rd party publications and to ask 3rd party writers to do a
disclosure that their views are not endorsed by ASF project. To that end,
ASF project can't really tell you that some publication is
"(in)appro
Prakash,
if you are using any Mahout Mapreduce algorithm for research, please make
sure to make this disclosure:
all Mahout MapReduce algorithms are officially not supported and deprecated
since February, 2014 (IIRC). I can dig up a specific issue regarding this.
There also has been an announceme
i think in spark 1.6 this really became more flexible in terms of only
specifying max/min thresholds.
Yes shuffle spills in spark during multiplication are humongous, i tried a
few hacks but that's spark. that's one of known bottlenecks unfortunately.
You are welcome to try and hack A'B too. My pe
not disrupt the overall flow?
>
>
> Have a great evening!
> Mihai
>
> > On 21 Mar 2016, at 19:31, Dmitriy Lyubimov wrote:
> >
> > my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
> > packaging. as long as MR is still here (and I would say
my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
packaging. as long as MR is still here (and I would say it needs to be
still here, unless it falls in complete disrepair and totally out of sync
with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as MR
goes,
For the purposes of this book (and otherwise too, as far as i know)
"Samsara" is a release code name, defined as 0.10 and after. That includes
all new code that happened after that, and the code that is still not
deprectated (although most of MapReduce code is, by now, as evidenced by
MAHOUT-1510)
ecked both links, they have only front and back cover of the book. No
> > table of contents
> > On Feb 25, 2016 9:57 AM, "Suneel Marthi" wrote:
> >
> >> You can see the TOC on Amazon
> >>
> >>
> >>
> http://www.amazon.com/Apache
BTW, depending on the resource manager, 10G per executor may not
necessarily be a sufficient number. I never plan less than 1.5G per core
(after excluding block manager, or 3Gb per core including block manager).
That means that 10G executor memory might be barely enough for 4-core
worker nodes. So
bottom line increase executor's non-mem-block memory and reduce indivdiual
starting task size until it all fits.
On Tue, Feb 16, 2016 at 4:09 PM, Dmitriy Lyubimov wrote:
> the original exception definitely happens in the task when mahout tries to
> build an entire matrix blo
the original exception definitely happens in the task when mahout tries to
build an entire matrix block out of a partition. Use more tasks, smaller in
size initially. using par(min=??) will help to repartition to at least ??
tasks. off-hdfs defaults are just too big for matrix processing. Not sure
ely OK, so we can just leave it at that.
>
> Best regards,
> David
>
> On Mon, Feb 1, 2016 at 11:52 PM, Dmitriy Lyubimov
> wrote:
>
> > the user list will not let attachments thru.
> >
> > On Sun, Jan 31, 2016 at 11:59 PM, David Starina >
> > wrote
ahout itself. make sure to
observe transitive dependency rules for the front end.
On Tue, Feb 2, 2016 at 12:53 PM, Dmitriy Lyubimov wrote:
> this is strange. if you took over the context, added jars manually and it
> still does not work, there's something wrong with spark i guess or
nk you very much again,
>
> Kind Regards,
> Bahaa
>
>
> On Tue, Feb 2, 2016 at 12:01 PM, Dmitriy Lyubimov
> wrote:
>
> > Bahaa, first off, i don't think we have certified any of releases to run
> > with spar 1.6 (yet). I think spark 1.5 is the l
Bahaa, first off, i don't think we have certified any of releases to run
with spar 1.6 (yet). I think spark 1.5 is the last known release to run
with 0.11 series.
Second, if you use mahoutSparkContext() method to create context, it would
look for MAHOUT_HOME setup to add mahout binaries to the job
the user list will not let attachments thru.
On Sun, Jan 31, 2016 at 11:59 PM, David Starina
wrote:
> Hi,
>
> I have problem importing the project to Eclipse - I get the error "Could
> not update project mahout-mr configuration". Attaching the error as image.
> Anyone seen this problem before? I
Nice!
On Dec 30, 2015 11:51 AM, "Pat Ferrel" wrote:
> As many of you know Mahout-Samsara includes an interesting and important
> extension to cooccurrence similarity, which supports cross-coossurrence and
> log-likelihood downsampling. This, when combined with a search engine,
> gives us a multim
he branches (most likely
> 0.10.x). No ?
>
> On Fri, Nov 6, 2015 at 7:05 PM, Dmitriy Lyubimov
> wrote:
>
> > hm. I did not find the staging repo. is it gone already?
> >
> > One thing, if i may whine (I already asked for it last time):
> > Can we please publish
argh bummer.
On Fri, Nov 6, 2015 at 4:01 PM, Suneel Marthi wrote:
> Thanks. We have 3 +1 votes and no -1s.
>
> This release has passed and the Voting is officially closed, will send an
> announcement out when the release has been finalized.
>
> Thanks again.
>
> On Fri, Nov 6, 2015 at 5:57 PM, A
hm. I did not find the staging repo. is it gone already?
One thing, if i may whine (I already asked for it last time):
Can we please publish -tests artifacts, please pretty please?
it is so much easier if derived applications could re-use mahout testing
framework.
On Fri, Nov 6, 2015 at 2:57 PM
Pavan, I guess part of the documentation difficulty is in that Mahout
Samsara environment is only used for "training" but external components are
used for "scoring". So it is not 100% end-to-end Mahout solution to
document.
Pat, it would be nice though to put some of your docs on to Mahout site
th
matrix multiplication and
> factorization. thanks, canal
>
>
> On Tuesday, October 20, 2015 6:37 AM, Dmitriy Lyubimov <
> dlie...@gmail.com> wrote:
>
>
> On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel
> wrote:
>
> > Even have code running using the Predici
On Mon, Oct 19, 2015 at 3:29 PM, Pat Ferrel wrote:
> Even have code running using the PredicitonIO framework. This includesa
> SDK to event store to realtime query. Loosely speaking a lambda
> architecture. Most of the whole enchilada running except the content part
> of the equation, which only
or pseudoinverse really, i guess
On Thu, Oct 8, 2015 at 3:58 PM, Dmitriy Lyubimov wrote:
> Mahout translation (approximation, since ssvd is reduced-rank, not the
> true thing):
>
> val (drmU, drmV, s) = dssvd(drmA, k = 100)
> val drmInvA = drmV %*% diagv(1 /=: s) %*% d
Mahout translation (approximation, since ssvd is reduced-rank, not the true
thing):
val (drmU, drmV, s) = dssvd(drmA, k = 100)
val drmInvA = drmV %*% diagv(1 /=: s) %*% drmU.t
Still, technically, it is a right inverse as in reality m is rarely the
same as n. Also, k must be k<= drmA.nrow min drmA
DRM format is compatible on persistence level with Mahout MapReduce
algorithms.
It is a Hadoop sequence file. The key is unique, can be one of
-- unique ordinal IntWriteable, treated as a row number (i.e. nrow=max(int
key)), or
-- Text, LongWritable, BytesWritable, or .. forget what else. This
t
:) strictly speaking out of core is anything that is not in memory, e.g.
sequential algorithms are generally also considered out-of-core
btw i though 0.11.x was for 1.3? or that was re-certified for 1.4 too?
On Tue, Oct 6, 2015 at 1:09 PM, Pat Ferrel wrote:
> Linear algebra stuff is what Mahou
eze, actually my current impl of sqDist uses it:
>
> https://github.com/danielkorzekwa/bayes-scala-gp/blob/master/src/main/scala/dk/gp/math/sqDist.scala
>
> still 3 times slower that sq_dist from gpml
>
> thanks for BID Data Project info
>
> On 9 September 2015 at 18:45, Dmitri
Hi Daniel,
you mean, for dense algebra single-threaded java vs. cache, multithreaded,
SSE4-optimized Intel MKL? I am actually surprised it is not at least 10x.
Mahout focuses on ease of distributed implementations (i.e. dsq_dist
variant of the routine) but has been somewhat lazy on marrying mahou
seems like a maven dependency problem (mahout-math does not publish its
test artifacts?).
i thought though that this was not a known issue... hm.
On Tue, Sep 8, 2015 at 7:31 AM, Dulakshi Vihanga
wrote:
> When I tried to build mahout using "mvn clean install
> -Dmaven.test.skip=true" I got the f
Not that I know of. would be nice to have.
On Fri, Aug 14, 2015 at 4:42 PM, Nick Kolegraff
wrote:
> Hey Mahouts,
> Looking for some time series analysis stuff I can use in mahout. I don't
> see much, other than this legacy HMM stuff.
>
> https://mahout.apache.org/users/classification/hidden-mar
Do you mean in core matrix inversion? It is supported via solve. Actually
it is supported both in Java and Scala.
On Aug 5, 2015 9:11 PM, "go canal" wrote:
> Hello,I am new to Mahout. Would appreciate if someone could tell me if
> matrix inverse is still supported in the latest release (0.10) ? I
(1) all i ever used with spark is Oracle jvm.
(2) take the head of either master or 0.10.x branch. the heads there are
some ~30-odd bug fix issues apart from 0.10.1 release, we really should've
released 0.10.2 and 0.11.0 by now but i guess end of summer is a slow
season.
(3) If you want to use spar
PPS. one of "better" backends, if there any comparison really is
appropriate, is expected to be Apache Flink.
On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov wrote:
> i guess i was a bit vague. by quasi-agnostic i mean that some code, the
> smaller part of it, may include
i guess i was a bit vague. by quasi-agnostic i mean that some code, the
smaller part of it, may include specific backend engine dependencies
unfortunately. it should be easily portable though.
On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov wrote:
> Mahout is moving to be backend-agnos
derstand
> > the algo perfectly-- so this is a great heads up. Any advice oor warnings
> > on hadoop installations and versions??
> >
> > On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov
> > wrote:
> >
> > > MapReduce things enter de-facto end
MapReduce things enter de-facto end-of-life. Not that we specifically don't
want to support them, it is de-facto nobody bothers to support them --
especially risks are high with new versions of hadoop and EMR.
That said, we'd be grateful for any guide about doing this in EMR.
On Wed, Jul 22, 2015
assuming task memory x number of cores does not exceed ~5g, and block cache
manager ratio does not have some really weird setting, the next best thing
to look at is initial task split size. I don' think in the release you are
looking at the driver manages initial off-dfs splits satisfactorily (tha
I don't know. seems like somebody is sitting on the port. `lsof` utility
may help to figure what it is.
On Wed, Jul 8, 2015 at 8:18 AM, Parimi Rohit wrote:
> Hi Dimitry,
>
> Please find my answers inline.
>
>
> On Tue, Jul 7, 2015 at 7:48 PM, Dmitriy Lyubimov
> wrot
Travis,
0.10.x branch is for spark 1.2.x and master (0.11.0-snapshot) is for spark
1.3.x.
my undersanding 0.11.0 should mostly work with exception for Spark shell,
which is disabled on the HEAD. we are still woking on PR
https://github.com/apache/mahout/pull/146 to re-enable it again.
numNonZeroE
this settings are for spark. spark shell only needs master (which is by
default local), `MASTER` variable.
Although. Your error indicates that it does try to go somewhere. are you
able to run regular spark shell?
in the head of 0.10.x branch you can specify additional spark properties
in MAHOUT_O
attachments are not showing up on apache lists.
On Tue, Jul 7, 2015 at 10:30 AM, Rodolfo Viana wrote:
> Hi,
>
> I’m trying to run Mahout 0.10 using Spark 1.1.1 and so far I didn’t have
> any success passing a file on hdfs. My actual problem is when I try to run
> the example:
>
> bin/mahout spa
"streaming k-means" is something else afaik. Streaming k-means is reserved
for a particular k-means method (in Mahout, at least, [1]).
Whereas as far as i understand what mllib calls "streaming k-means" is name
given by mllib contributor which really means "online k-means", i.e. radar
tracking of
I guess you are talking DRM format (sequence file).
current recommended way is to use mahout-samsara with e.g. Spark (no
mapreduce support there). Translation of in-core matrix (sparse, for
example) would take converting it to distributed matrix (DRM) first by
means of drmParallelize [1] and then
correction: dfsWrite (typo)
On Thu, Jun 11, 2015 at 3:53 PM, Dmitriy Lyubimov wrote:
> I guess you are talking DRM format (sequence file).
>
> current recommended way is to use mahout-samsara with e.g. Spark (no
> mapreduce support there). Translation of in-core matrix (sparse, f
>
> Also, are there some specific dependencies of versions? Should I wait for
> the next release?
>
>
> Thanks a lot and have a great day!
> Mihai
>
> > On Jun 10, 2015, at 23:57, Dmitriy Lyubimov wrote:
> >
> > Hadoop has its own guava. This is some dependency clas
I am not sure how maven repo is managed for released apache projects.
Binary artifacts are available for downloads. Also if you are building from
source, they would be found on standard places for a maven multimodule
project, i.e. module-name/target/artifact-jar.
On Jun 11, 2015 3:28 AM, "Raghuveer
Hadoop has its own guava. This is some dependency clash at runtime, for
sure. Other than that no idea. MR is being phased out. Why don't u try
spark version in upcoming .10.2?
On Jun 10, 2015 12:58 PM, "Mihai Dascalu" wrote:
> Hi!
>
> After upgrading to Mahout 0.10.1, I have a runtime exception i
Spark's word2vec is pretty agile.
On Wed, May 13, 2015 at 12:13 PM, David Starina
wrote:
> You can also check out the implementation in MLlib:
> https://spark.apache.org/docs/latest/mllib-feature-extraction.html#word2vec
>
>
>
> On Wed, May 13, 2015 at 9:11 PM, Dan Dong wrote:
>
> > Thanks Andr
86
> null
>
>
> Now, my question is, how can I run a specified test with maven? For "mvn
> test" is so slow, then if I can do like "mvn test LocalSSVDPCASparseTest",
> my efficiency will be improved.
>
> At 2015-04-29 01:25:34, "Dmitriy Lyubimov&
On Tue, Apr 28, 2015 at 1:14 PM, Mihai Dascalu
wrote:
> Indeed, it’s in local mode - but to setup hadoop on my Mac for the task at
> hand did not seem necessary (the SVD uses a sparse matrix of 11MB).
>
oh. Then it is a wrong tool. try bidMat, I promise you won't be
dissapointed. https://github
ecurity.UserGroupInformation - PrivilegedAction
> as:mihaidascalu (auth:SIMPLE)
> from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
> 1415709 [SwingWorker-pool-1-thread-1] ERROR
> view.widgets.semanticModels.SemanticModelsTraining - Error procesing
> config/LDA dir
9 [SwingWorker-pool-1-thread-1] DEBUG
> org.apache.hadoop.security.UserGroupInformation - PrivilegedAction
> as:mihaidascalu (auth:SIMPLE)
> from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
> 1415709 [SwingWorker-pool-1-thread-1] ERROR
> view.widgets.semanticModels.SemanticModelsTraining - Error procesing
&
if your run time gets too high, try to start with low -k (like 10 or
something) and -q=0, that will significantly reduce complexity of the
problem.
if this works, you need to find optimal levers that suit your
hardware/input size/ runtime requirements. ( I can tell you right away that
(k+p) value
, clone (fork) apache/mahout in
your account, (optionally) create a patch branch, commit your modifications
there, and then use github UI to create a pull request against
apache/mahout.
thanks.
-d
On Mon, Apr 27, 2015 at 8:39 PM, lastarsenal wrote:
> Hi, Dmitriy Lyubimov
>
>
> OK, I ha
Thank you for this analysis. I can't immediately confirm this since it's
been a while but this sounds credible.
Do you mind to file a jira with all this information, and even perhaps do a
PR on github?
thank you.
On Mon, Apr 27, 2015 at 4:32 AM, lastarsenal wrote:
> Hi, All,
>
>
> Recentl
t;
> On Apr 3, 2015, at 12:22 PM, Dmitriy Lyubimov wrote:
>
> Although... i am not aware of one in A'A
>
> could be faulty vector length in a matrix if matrix was created by drmWrap
> with explicit specification of ncol
>
> On Fri, Apr 3, 2015 at 12:20 PM, Dmitriy
Ah. yes i believe it is a bug in non-slim A'A similar to one I fixed for
AB' some time ago. It makes error in computing parallelism and split ranges
of the final product.
On Fri, Apr 3, 2015 at 12:22 PM, Dmitriy Lyubimov wrote:
> Although... i am not aware of one in A'A
&
Although... i am not aware of one in A'A
could be faulty vector length in a matrix if matrix was created by drmWrap
with explicit specification of ncol
On Fri, Apr 3, 2015 at 12:20 PM, Dmitriy Lyubimov wrote:
> it's a bug. There's a number of similar ones in operator A
it's a bug. There's a number of similar ones in operator A'B.
On Fri, Apr 3, 2015 at 6:23 AM, Michael Kelly wrote:
> Hi Pat,
>
> I've done some further digging and it looks like the problem is
> occurring when the input files are split up to into parts. The input
> to the item-similarity matrix
Note that these instructions actually mean running PCA, not SVD but that's
probably the intention here. I don't think just running SVD helps.
On Mon, Mar 30, 2015 at 1:04 AM, Suneel Marthi
wrote:
> Here are the steps if u r using Mahout-mrlegacy in the present Mahout
> trunk:
>
> 1. Generate tfi
I am not aware of _any_ scenario under which lanczos would be faster (see
N. Halko's dissertation for comparisons), although admittedly i did not
study all possible cases.
having -k=100 is probably enough for anything. I would not recommend
running -q>0 for k>100 as it would become quite slow in
This looks like hadoop or spark -specific thing (snappy codec is used by
spark by default). There should be a way to disable this to a more
palatable library but you will need to investigate it a little bit since i
don't think anybody here knows mac specifics.
Better yet is to figure how to instal
spark 1.2 not supported (yet). current head runs on 1.1.0 (but i guess you
can take a pull request #71 and compile it for 1.1.1 too, and perhaps even
1.2)
On Tue, Jan 27, 2015 at 12:04 PM, Kevin Zhang <
zhangyongji...@yahoo.com.invalid> wrote:
> Hi,
>
> I'm new to Spark, Mahout. Just tried to run
Oh, specifically to item similarity. Not sure.
On Jan 22, 2015 8:42 AM, "Dmitriy Lyubimov" wrote:
> There are some computations that are done in core in front end. This is
> always method specific. Outside the method itself, there are no additional
> requirements on top of
There are some computations that are done in core in front end. This is
always method specific. Outside the method itself, there are no additional
requirements on top of spark requirements. However, since many ml methods
tend to be more iterable than your regular etl stuff, expect also higher
deman
strange. legacy still depends on m-math and should include it into job jar.
or did it get that much out of hand after MR deprecation?
On Fri, Jan 9, 2015 at 8:51 AM, mw wrote:
> I found a solution!
> I had to upload the missing jars onto yarn hdfs and add the following to
> the hadoop Configurat
+1. I think contributions like this would count.
On Thu, Dec 4, 2014 at 3:14 PM, Brian Dolan wrote:
> Though I don't have an immediate use case, I'd +1 the idea!
>
> On Dec 4, 2014, at 3:11 PM, Andrew Musselman
> wrote:
>
> > Any interest in a topological data analysis package in Mahout?
> >
>
Correction. MR.SCAN is Univ. of Wisconsin's paper. Google Beijing was
another paper on the subject but i found mr.scan having a bit more elegant
simplicity in it.
On Mon, Dec 1, 2014 at 12:41 PM, Dmitriy Lyubimov wrote:
> if memory serves me, DeLiClu (density-link) is current best densi
write MapReduce code for DBSCAN and OPTICS for
> GSoC '15.
>
> I would like to take your input as to how much of significance would this
> be of to the community in general?
>
> Thanks,
>
> Chirag Nagpal
> University of Pune, India
> www.chiragnagpal.com
> __
1 - 100 of 616 matches
Mail list logo