The list is alive! (play on old IMAX film *The Dream is Alive*). But does
it breathe? Not sure, I have not done solr in over a decade.
On Fri, 23 Feb 2024 at 10:05, Beale, Jim (US-KOP)
wrote:
> I have a Solrcloud installation of three servers on three r5.xlarge EC2
> with a shared disk drive u
Out of curiosity : are there functional limitations in Spark Standalone
that are of concern? Yarn is more configurable for running non-spark
workloads and how to run multiple spark jobs in parallel. But for a single
spark job it seems standalone launches more quickly and does not miss any
features
Please do not send advertisements on this channel.
On Thu, 10 Nov 2022 at 13:40, sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:
> Hi All,
>
> Is anyone looking for a spark scala contract role inside the USA? A
> company called Maxonic has an open spark scala contract position (100%
automated translations are rough at best
On Tue, 10 Aug 2021 at 06:16, Dave wrote:
> Cant chrome translate the page or is there too much JavaScript?
>
> > On Aug 10, 2021, at 8:44 AM, Eric Pugh
> wrote:
> >
> > Nothing built in today. The Solr GUI is written in AngularJS, and
> while it refe
While the core of the Spark is and has been quite solid and a go-to
infrastructure, the *streaming *part of the story was still quite weak at
least through mid last year. I went into depth on both structured and the
older DStream. The structured in particular was difficult to use: both in
terms o
I agree with Wim's assessment of data engineering / ETL vs Data Science.
I wrote pipelines/frameworks for large companies and scala was a much
better choice. But for ad-hoc work interfacing directly with data science
experiments pyspark presents less friction.
On Sat, 10 Oct 2020 at 13:03, Mich Ta
Why would it be this way instead of the other way around?
On Mon, 27 Jul 2020 at 12:27, David wrote:
> Hello Hive Users.
>
> I am interested in gathering some feedback on the adoption of
> Hive-on-Spark.
>
> Does anyone care to volunteer their usage information and would you be
> open to removin
{ println(it) }
}
}
So that shows some of the niceness of kotlin: intuitive type conversion
`to`/`to` and `dsOf( list)`- and also the inlining of the side
effects. Overall concise and pleasant to read.
On Tue, 14 Jul 2020 at 12:18, Stephen Boesch wrote:
> I started with scala/spark in
I started with scala/spark in 2012 and scala has been my go-to language for
six years. But I heartily applaud this direction. Kotlin is more like a
simplified Scala - with the benefits that brings - than a simplified java.
I particularly like the simplified / streamlined collections classes.
Reall
Spark in local mode (which is different than standalone) is a solution for
many use cases. I use it in conjunction with (and sometimes instead of)
pandas/pandasql due to its much wider ETL related capabilities. On the JVM
side it is an even more obvious choice - given there is no equivalent to
pand
+1 Thx for seeing this through
On Wed, 1 Jul 2020 at 20:03, Imran Rashid
wrote:
> +1
>
> I think this is going to be a really important feature for Spark and I'm
> glad to see Holden focusing on it.
>
> On Wed, Jul 1, 2020 at 8:38 PM Mridul Muralidharan
> wrote:
>
>> +1
>>
>> Thanks,
>> Mridul
gt; draft for comment by the end of Spark summit. I'll be using the same design
>> document for the design component, so if anyone has input on the design
>> document feel free to start leaving comments there now.
>>
>> On Sat, Jun 20, 2020 at 4:23 PM Stephen Boesch wro
Hi given there is a design doc (contrary to that common) - is this going to
move forward?
On Thu, 18 Jun 2020 at 18:05, Hyukjin Kwon wrote:
> Looks it had to be with SPIP and a proper design doc to discuss.
>
> 2020년 2월 9일 (일) 오전 1:23, Erik Erlandson 님이 작성:
>
>> I'd be willing to pull this in, u
afaik It has been there since Spark 2.0 in 2015. Not certain about Spark
1.5/1.6
On Thu, 18 Jun 2020 at 23:56, Anwar AliKhan
wrote:
> I first ran the command
> df.show()
>
> For sanity check of my dataFrame.
>
> I wasn't impressed with the display.
>
> I then ran
> df.toPandas() in Jupiter N
Second paragraph of the PR lists the design doc.
> There is a design document at
https://docs.google.com/document/d/1xVO1b6KAwdUhjEJBolVPl9C6sLj7oOveErwDSYdT-pE/edit?usp=sharing
On Thu, 18 Jun 2020 at 18:05, Hyukjin Kwon wrote:
> Looks it had to be with SPIP and a proper design doc to discuss.
I am reading between the lines that ambari is no longer a strategic
platform. Would someone please provide a link/reference to a Cloudera press
release or blog describing this and maybe related decisions/roadmaps? thx!
Am Mo., 11. Mai 2020 um 10:05 Uhr schrieb Aaron Bossert <
aa...@punchcyber.com
predicates are typically sql's.
Am Sa., 2. Mai 2020 um 06:13 Uhr schrieb Stephen Boesch :
> Hi Mich!
>I think you can combine the good/rejected into one method that
> internally:
>
>- Create good/rejected df's given an input df and input
>rules/predicates to appl
Hi Mich!
I think you can combine the good/rejected into one method that
internally:
- Create good/rejected df's given an input df and input rules/predicates
to apply to the df.
- Create a third df containing the good rows and the rejected rows with
the bad columns nulled out
- Ap
The warning signs were there from the first email sent from that person. I
wonder is there any way to deal with this more proactively.
Am Do., 16. Apr. 2020 um 10:54 Uhr schrieb Mich Talebzadeh <
mich.talebza...@gmail.com>:
> good for you. right move
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> h
I have been using Idea for both scala/spark and pyspark projects since
2013. It required fair amount of fiddling that first year but has been
stable since early 2015. For pyspark projects only Pycharm naturally also
works v well.
Am Di., 7. Apr. 2020 um 09:10 Uhr schrieb yeikel valdes :
>
> Ze
;re saying but I'm not sure how to go about it.
> Any tips or good tutorials on it to point me in the right direction.
> Thanks for the response.
>
> On Fri, Oct 4, 2019 at 5:58 PM Stephen Boesch wrote:
>
>> You'll need to start a listener/server on the scala end and
You'll need to start a listener/server on the scala end and communicate
vai a websocket connection from angular.
Am Fr., 4. Okt. 2019 um 13:00 Uhr schrieb Joshua Ochsankehl <
joshua.ochsank...@gmail.com>:
> Is it possible to pass a value to a spark/scala function from
> an angular submit button?
exact same code. why running them two different ways vary so much in the
> execution time.
>
>
>
>
> *Regards,Dhrubajyoti Hati.Mob No: 9886428028/9652029028*
>
>
> On Wed, Sep 11, 2019 at 8:42 AM Stephen Boesch wrote:
>
>> Sounds like you have done your homewo
exact same code. why running them two different ways vary so much in the
> execution time.
>
>
>
>
> *Regards,Dhrubajyoti Hati.Mob No: 9886428028/9652029028*
>
>
> On Wed, Sep 11, 2019 at 8:42 AM Stephen Boesch wrote:
>
>> Sounds like you have done your homewo
Sounds like you have done your homework to properly compare . I'm
guessing the answer to the following is yes .. but in any case: are they
both running against the same spark cluster with the same configuration
parameters especially executor memory and number of workers?
Am Di., 10. Sept. 2019
Sounds like you have done your homework to properly compare . I'm
guessing the answer to the following is yes .. but in any case: are they
both running against the same spark cluster with the same configuration
parameters especially executor memory and number of workers?
Am Di., 10. Sept. 2019
There are several high bars to getting a new algorithm adopted.
* It needs to be deemed by the MLLib committers/shepherds as widely useful
to the community. Algorithms offered by larger companies after having
demonstrated usefulness at scale for use cases likely to be encountered
by many othe
Consider the following *intended* sql:
select row_number()
over (partition by Origin order by OnTimeDepPct desc) OnTimeDepRank,*
from flights
This will *not* work in *structured streaming* : The culprit is:
partition by Origin
The requirement is to use a timestamp-typed field such as
par
Please refrain from using this list as a job board. thank you.
Am Mo., 15. Apr. 2019 um 07:00 Uhr schrieb Manoj Murumkar <
manoj.murum...@gmail.com>:
> Damian,
>
> Let me know when we can talk. I have done extensive work on Kafka and run
> a boutique consulting firm specializes in this work. Let
There are several suggestions on this SOF
https://stackoverflow.com/questions/38984775/spark-errorexpected-zero-arguments-for-construction-of-classdict-for-numpy-cor
1
You need to convert the final value to a python list. You implement the
function as follows:
def uniq_array(col_array):
x =
You might have better luck downloading the 2.4.X branch
Am Di., 12. März 2019 um 16:39 Uhr schrieb swastik mittal :
> Then are the mlib of spark compatible with scala 2.12? Or can I change the
> spark version from spark3.0 to 2.3 or 2.4 in local spark/master?
>
>
>
> --
> Sent from: http://apache
I think scala 2.11 support was removed with the spark3.0/master
Am Di., 12. März 2019 um 16:26 Uhr schrieb swastik mittal :
> I am trying to build my spark using build/sbt package, after changing the
> scala versions to 2.11 in pom.xml because my applications jar files use
> scala 2.11. But build
Erik - is there a current locale for approved/recommended third party
additions? The spark-packages has been stale for years it seems.
Am Fr., 19. Okt. 2018 um 07:06 Uhr schrieb Erik Erlandson <
eerla...@redhat.com>:
> Hi Matt!
>
> There are a couple ways to do this. If you want to submit it for
So the LogisticRegression with regParam and elasticNetParam set to 0 is not
what you are looking for?
https://spark.apache.org/docs/2.3.0/ml-classification-regression.html#logistic-regression
.setRegParam(0.0)
.setElasticNetParam(0.0)
Am Do., 11. Okt. 2018 um 15:46 Uhr schrieb pikufolgado <
#>
Permalink
<https://issues.apache.org/jira/browse/SPARK-10943?focusedCommentId=16462797&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16462797>
[image: javadba]Stephen Boesch
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=javadba> ad
Assuming that the spark 2.X kernel (e.g. toree) were chosen for a given
jupyter notebook and there is a Cell 3 that contains some Spark DataFrame
operations .. Then :
- what is the relationship does the %%spark magic and the toree kernel?
- how does the %%spark magic get applied to that
(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
2018-05-07 10:30 GMT-07:00 Stephen Boesch :
> I am intermittently running into guava dependency issues across mutiple
> spark projects. I have tried maven shade / relocate but it do
I am intermittently running into guava dependency issues across mutiple
spark projects. I have tried maven shade / relocate but it does not
resolve the issues.
The current project is extremely simple: *no* additional dependencies
beyond scala, spark, and scalatest - yet the issues remain (and yes
[
https://issues.apache.org/jira/browse/SPARK-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462797#comment-16462797
]
Stephen Boesch commented on SPARK-10943:
Given the comment by Daniel Davis
to be
verified.
And maybe *all *collects do require sufficient memory - would you like to
check the source code to see if there were disk backed collects actually
happening for some cases?
2018-04-28 9:48 GMT-07:00 Deepak Goel :
> There is something as *virtual memory*
>
> On Sat, 28
Do you have a machine with terabytes of RAM? afaik collect() requires RAM
- so that would be your limiting factor.
2018-04-28 8:41 GMT-07:00 klrmowse :
> i am currently trying to find a workaround for the Spark application i am
> working on so that it does not have to use .collect()
>
> but, fo
, make sure to check what
> you're currently listed as shepherding!) The links for searching can be
> useful too.
>
> On Thu, Dec 7, 2017 at 3:55 PM, Stephen Boesch wrote:
>
>> Thanks Joseph. We can wait for post 2.3.0.
>>
>> 2017-12-07 15:36 GMT-08:00 Joseph
While MLLib performed favorably vs Flink it *also *performed favorably vs
spark.ml .. and by an *order of magnitude*. The following is one of the
tables - it is for Logistic Regression. At that time spark.ML did not yet
support SVM
From: https://bdataanalytics.biomedcentral.com/articles/10.
118
Hi Richard, this is not a jobs board: please only discuss spark application
development issues.
2017-12-21 8:34 GMT-08:00 Richard L. Burton III :
> I'm trying to locate four independent contractors who have experience with
> Spark. I'm not sure where I can go to find experienced Spark consultants
A relevant observation: there was a closed/executed jira last year to
remove the option to disable the codegen flag (and unsafe flag as well):
https://issues.apache.org/jira/browse/SPARK-11644
2017-12-10 13:16 GMT-08:00 Jacek Laskowski :
> Hi,
>
> I'm wondering why a physical operator like Gener
JIRA as well as the few mailing list threads about directions.
>
> For myself, I'm mainly focusing on fixing some issues with persistence for
> custom algorithms in PySpark (done), adding the image schema (done), and
> using ML Pipelines in Structured Streaming (WIP).
>
> Josep
I have been testing on the 20 NewsGroups dataset - which the Spark docs
themselves reference. I can confirm that perplexity increases and
likelihood decreases as topics increase - and am similarly confused by
these results.
2017-09-28 10:50 GMT-07:00 Cody Buntain :
> Hi, all!
>
> Is there an exa
e spark.ml were headed?
2017-11-29 6:39 GMT-08:00 Stephen Boesch :
> Any further information/ thoughts?
>
>
>
> 2017-11-22 15:07 GMT-08:00 Stephen Boesch :
>
>> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>>
>> 2.2.0 https://issues.apache
Any further information/ thoughts?
2017-11-22 15:07 GMT-08:00 Stephen Boesch :
> The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
>
> 2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
>
> 2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
> ..
The roadmaps for prior releases e.g. 1.6 2.0 2.1 2.2 were available:
2.2.0 https://issues.apache.org/jira/browse/SPARK-18813
2.1.0 https://issues.apache.org/jira/browse/SPARK-15581
..
It seems those roadmaps were not available per se' for 2.3.0 and later? Is
there a different mechanism for that
In BinaryLogisticRegressionSummary there are @Since("1.5.0") tags on a
number of comments identical to the following:
* @note This ignores instance weights (setting all to 1.0) from
`LogisticRegression.weightCol`.
* This will change in later Spark versions.
Are there any plans to address this? O
Hi Mich, the github link has a brief intro - including a link to the formal
docs http://logisland.readthedocs.io/en/latest/index.html . They have an
architectural overview, developer guide, tutorial, and pretty comprehensive
api docs.
2017-10-24 13:31 GMT-07:00 Mich Talebzadeh :
> thanks Thomas
A couple of less obvious facets of getting over the (significant!) hurdle
to have an algorithm accepted into mllib (/spark.ml):
- the review time can be *very *long - a few to many months is a
typical case even for relatively fast tracked algorithms
- you will likely be asked to provide
@Vadim Would it be true to say the `.rdd` *may* be creating a new job -
depending on whether the DataFrame/DataSet had already been materialized
via an action or checkpoint? If the only prior operations on the
DataFrame had been transformations then the dataframe would still not have
been calcu
maven
repo
- The local maven repo is included by default - so should not need to
do anything special there
The same errors from the original post continue to occur.
2017-10-11 20:05 GMT-07:00 Stephen Boesch :
> A clarification here: the example is being run *from the Spark codebase*.
>
maven repo in SBT?
>
> -Paul
>
> Sent from my iPhone
>
> On Oct 11, 2017, at 5:48 PM, Stephen Boesch wrote:
>
> When attempting to run any example program w/ Intellij I am running into
> guava versioning issues:
>
> Exception in thread "main" java.lan
When attempting to run any example program w/ Intellij I am running into
guava versioning issues:
Exception in thread "main" java.lang.NoClassDefFoundError:
com/google/common/cache/CacheLoader
at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:73)
at org.apache.spark.SparkConf.
I printed out the "Welcome" page on my HP laserjet printer and scanned it
in using .png . The quality is quite good. So I had been anticipating
maybe 85%+ accuracy on the tesseract-OCR. I did not even bother to tally
carefullly - but by eyeballing it seems about 50%.I had used all
defaul
ent from my iPhone
> Pardon the dumb thumb typos :)
>
> > On Aug 10, 2017, at 1:46 PM, Stephen Boesch wrote:
> >
> >
> > While the DataFrame/DataSets are useful in many circumstances they are
> cumbersome for many types of complex sql queries.
> >
> >
While the DataFrame/DataSets are useful in many circumstances they are
cumbersome for many types of complex sql queries.
Is there an up to date *SQL* reference - i.e. not DataFrame DSL operations
- for version 2.2?
An example of what is not clear: what constructs are supported within
select
Spark SQL did not support explicit partitioners even before tungsten: and
often enough this did hurt performance. Even now Tungsten will not do the
best job every time: so the question from the OP is still germane.
2017-06-25 19:18 GMT-07:00 Ryan :
> Why would you like to do so? I think there's
You would need to use *native* Cassandra API's in each Executor -
not org.apache.spark.sql.cassandra.CassandraSQLContext
- including to create a separate Cassandra connection on each Executor.
2017-05-28 15:47 GMT-07:00 Abdulfattah Safa :
> So I can't run SQL queries in Executors ?
>
> On Sun, M
Jupyter with toree works well for my team. Jupyter is well more refined vs
zeppelin as far as notebook features and usability: shortcuts, editing,etc.
The caveat is it is better to run a separate server instanace for
python/pyspark vs scala/spark
2017-05-17 19:27 GMT-07:00 Richard Moorhead :
>
Anyone have this working - either in 1.X or 2.X?
thanks
For now I have added to the log4j.properties:
log4j.logger.org.apache.parquet=ERROR
2017-02-18 11:50 GMT-08:00 Stephen Boesch :
> The following JIRA mentions that a fix made to read parquet 1.6.2 into 2.X
> STILL leaves an "avalanche" of warnings:
>
>
> https://issu
The following JIRA mentions that a fix made to read parquet 1.6.2 into
2.X STILL leaves an "avalanche" of warnings:
https://issues.apache.org/jira/browse/SPARK-17993
Here is the text inside one of the last comments before it was merged:
I have built the code from the PR and it indeed succeed
Please take job inquiries/offers off of the main channel. thanks.
2017-02-04 12:19 GMT-08:00 Vaibhav Khanduja :
> Thanks Brock.
>
> Since I am based in Santa Clara, CA. I was wondering if anything is
> located local. Skills you need tough definitely match with me – Spark, HPC
> etc.
>
> From: Br
re: spark-packages.org and "Would these really be better in the core
project?" That was not at all the intent of my input: instead to ask "how
and where to structure/place deployment quality code that yet were *not*
part of the distribution?" The spark packages has no curation whatsoever
: no
Along the lines of #1: the spark packages seemed to have had a good start
about two years ago: but now there are not more than a handful in general
use - e.g. databricks CSV.
When the available packages are browsed the majority are incomplete, empty,
unmaintained, or unclear.
Any ideas on how to
bump. Without keyboard shortcuts a notebook is nearly unusable: certainly
they must exist just where is a document for same?
thanks.
2017-01-17 9:58 GMT-08:00 Stephen Boesch :
> There was an old jira for keyboard shortcuts. But there did not appear to
> be an associated document
>
There was an old jira for keyboard shortcuts. But there did not appear to
be an associated document
https://issues.apache.org/jira/browse/ZEPPELIN-391
Is there a comprehensive cheat-sheet for the shortcuts? Especially to
compare to the excellent jupyter keyboard shortcuts; e.g. dd to delete a
ce
Would it be possible to share that communication? I am interested in this
thread.
2016-12-30 11:02 GMT-08:00 Ji Yan :
> Thanks Michael, Tim and I have touched base and thankfully the issue has
> already been resolved
>
> On Fri, Dec 30, 2016 at 9:20 AM, Michael Gummelt
> wrote:
>
>> I've cc'd T
This problem appears to be a regression on HEAD/master: when running
against 2.0.2 the pyspark job completes successfully including running
predictions.
2016-11-23 19:36 GMT-08:00 Stephen Boesch :
>
> For a pyspark job with 54 executors all of the task outputs have a single
> line in
For a pyspark job with 54 executors all of the task outputs have a single
line in both the stderr and stdout similar to:
Error: invalid log directory /shared/sparkmaven/work/app-20161119222540-/0/
Note: the directory /shared/sparkmaven/work exists and is owned by the same
user running the jo
path
> offheap in future 1.x release? Let me create one JIRA about this and let's
> discuss in the JIRA system. And to be very clear, it's a big YES to share
> our patches with all rather than only numbers, just which way is better
> (smile).
>
> And answers for @Stephe
path
> offheap in future 1.x release? Let me create one JIRA about this and let's
> discuss in the JIRA system. And to be very clear, it's a big YES to share
> our patches with all rather than only numbers, just which way is better
> (smile).
>
> And answers for @Stephe
that hedge read is very useful for reducing
> latency). So I think the peak throughput is true.
>
> There are more than 600 million people in China that use internet. So if
> they decide to do something to your system at the same time, it looks like
> a DDOS to your system...
>
> T
Repeating my earlier question: 20*Meg* queries per second?? Just checked
and *google* does 40*K* queries per second. Now maybe the "queries" are a
decomposition of far fewer end-user queries that cause a fanout of backend
queries. *But still .. *
So maybe please check your numbers again.
2016-1
While "apparently" saturating the N available workers using your proposed N
partitions - the "actual" distribution of workers to tasks is controlled by
the scheduler. If my past experience were of service - you can *not *trust
the default Fair Scheduler to ensure the round-robin scheduling of the
Am I correct in deducing there were on the order of 1.5-2.0 *trillion* queries
in a 24 hour span?
2016-11-18 23:35 GMT-08:00 Anoop John :
> Because of some compatibility issues, we decide that this will be done
> in 2.0 only.. Ya as Andy said, it would be great to share the 1.x
> backported patc
What is the state of the spark-packages project(s) ? When running a query
for machine learning algorithms the results are not encouraging.
https://spark-packages.org/?q=tags%3A%22Machine%20Learning%22
There are 62 packages. Only a few have actual releases - and even less with
dates in the past
It is private. You will need to put your code in that same package or
create an accessor to it living within that package
private[spark]
2016-11-03 16:04 GMT-07:00 Yanwei Zhang :
> I would like to use some matrix operations in the BLAS object defined in
> ml.linalg. But for some reason, spark s
You would likely want to create inline views that perform the filtering *before
*performing t he cubes/rollup; in this way the cubes/rollups only operate
on the pruned rows/columns.
2016-11-03 11:29 GMT-07:00 Andrés Ivaldi :
> Hello, I need to perform some aggregations and a kind of Cube/RollUp
>
Yes: will you have cycles to do it?
2016-09-12 9:09 GMT-07:00 Nick Pentreath :
> Never actually got around to doing this - do folks still think it
> worthwhile?
>
> On Thu, 21 Apr 2016 at 00:10 Joseph Bradley wrote:
>
>> Sounds good to me. I'd request we be strict during this process about
>> r
[
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431738#comment-15431738
]
Stephen Boesch commented on YARN-3249:
--
I have the same question as Xiaoyong Zhu:
[
https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420467#comment-15420467
]
Stephen Boesch commented on SPARK-2243:
---
Given this were not going to be f
@Jared / Yu Wei: Mesos is essentially a spanish word: so MAY-sos would
travel well.
2016-08-09 11:35 GMT-07:00 Ken Sipe :
> Apparently it depends on if you are British or not :)
> http://dictionary.cambridge.org/us/pronunciation/english/the-mesosphere
>
> apparently the absence of “phere" chang
I also did not understand why the Logging class was made private in Spark
2.0. In a couple of projects including CaffeOnSpark the Logging class was
simply copied to the new project to allow for backwards compatibility.
2016-06-28 18:10 GMT-07:00 Michael Armbrust :
> I'd suggest using the slf4j A
My team has a custom optimization routine that we would have wanted to plug
in as a replacement for the default LBFGS / OWLQN for use by some of the
ml/mllib algorithms.
However it seems the choice of optimizer is hard-coded in every algorithm
except LDA: and even in that one it is only a choice
out.write(Opcodes.REDUCE)
^
2016-06-22 23:49 GMT-07:00 Stephen Boesch :
> Thanks Jeff - I remember that now from long time ago. After making that
> change the next errors are:
>
> Error:scalac: missing or invalid dependency detected while loading class
> file 'RDD
16-06-22 23:39 GMT-07:00 Jeff Zhang :
> You need to
> spark/external/flume-sink/target/scala-2.11/src_managed/main/compiled_avro
> under build path, this is the only thing you need to do manually if I
> remember correctly.
>
>
>
> On Thu, Jun 23, 2016 at 2:30 PM, St
> It works well with me. You can try reimport it into intellij.
>
> On Thu, Jun 23, 2016 at 10:25 AM, Stephen Boesch
> wrote:
>
>>
>> Building inside intellij is an ever moving target. Anyone have the
>> magical procedures to get it going for 2.X?
>>
>>
Building inside intellij is an ever moving target. Anyone have the magical
procedures to get it going for 2.X?
There are numerous library references that - although included in the
pom.xml build - are for some reason not found when processed within
Intellij.
Having looked closely at Jupyter, Zeppelin, and Spark-Notebook : only the
latter seems to be close to having support for Spark 2.X.
While I am interested in using Spark Notebook as soon as that support were
available are there alternatives that work *now*? For example some
unmerged -yet -working
There are around twenty data generators in mllib -none of which are
presently migrated to ml.
Here is an example
/**
* :: DeveloperApi ::
* Generate sample data used for SVM. This class generates uniform random values
* for the features and adds Gaussian noise with weight 0.1 to generate label
What are you expecting us to do? Yash provided a reasonable approach -
based on the info you had provided in prior emails. Otherwise you can
convert it from python to spark - or find someone else who feels
comfortable to do it. That kind of inquiry would likelybe appropriate on a
job board.
2
@Jeff Klukas What is the concern about scala 2.11 vs 2.12? 2.11 runs on
both java7 and java8
2016-06-16 14:12 GMT-07:00 Jeff Klukas :
> Would the move to Java 8 be for all modules? I'd have some concern about
> removing Java 7 compatibility for kafka-clients and for kafka streams
> (though less
@Jeff Klukas What is the concern about scala 2.11 vs 2.12? 2.11 runs on
both java7 and java8
2016-06-16 14:12 GMT-07:00 Jeff Klukas :
> Would the move to Java 8 be for all modules? I'd have some concern about
> removing Java 7 compatibility for kafka-clients and for kafka streams
> (though less
How many workers (/cpu cores) are assigned to this job?
2016-06-09 13:01 GMT-07:00 SRK :
> Hi,
>
> How to insert data into 2000 partitions(directories) of ORC/parquet at a
> time using Spark SQL? It seems to be not performant when I try to insert
> 2000 directories of Parquet/ORC using Spark SQL
ooc are the tables partitioned on a.pk and b.fk? Hive might be using
copartitioning in that case: it is one of hive's strengths.
2016-06-09 7:28 GMT-07:00 Gourav Sengupta :
> Hi Mich,
>
> does not Hive use map-reduce? I thought it to be so. And since I am
> running the queries in EMR 4.6 therefo
1 - 100 of 548 matches
Mail list logo