Actually disregard! Forgot that
spark.dynamicAllocation.cachedExecutorIdleTimeout was defaulted to Infinity,
so lowering that should solve the problem :)
Mark.
--
View this message in context:
Calling unpersist on an RDD in a spark streaming application does not
actually unpersist the blocks from memory and/or disk. After the RDD has
been processed in a .foreach(rdd) call, I attempt to unpersist the rdd since
it is no longer useful to store in memory/disk. This mainly causes a problem
I reported this in the 1.6 preview thread, but wouldn't mind if someone can
confirm that ctrl-c is not keyboard interrupting / clearing the current line
of input anymore in the pyspark shell. I saw the change that would kill the
currently running job when using ctrl+c, but now the only way to
Nice! Built and testing on CentOS 7 on a Hadoop 2.7.1 cluster.
One thing I've noticed is that KeyboardInterrupts are now ignored? Is that
intended? I starting typing a line out and then changed my mind and wanted
to issue the good old ctrl+c to interrupt, but that didn't work.
Otherwise haven't
Regarding the 'spark.executor.cores' config option in a Standalone spark
environment, I'm curious about whether there's a way to enforce the
following logic:
*- Max cores per executor = 4*
** Max executors PER application PER worker = 1*
In order to force better balance across all workers, I
My apologies for mixing up what was being referred to in that case! :)
Mark.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/If-you-use-Spark-1-5-and-disabled-Tungsten-mode-tp14604p14629.html
Sent from the Apache Spark Developers List mailing list
Are you referring to spark.shuffle.manager=tungsten-sort? If so, we saw the
default value as still being as the regular sort, and since it was only
first introduced in 1.5, were actually waiting a bit to see if anyone
ENABLED it as opposed to DISABLING it since - it's disabled by default! :)
I
Built and tested on CentOS 7, Hadoop 2.7.1 (Built for 2.6 profile),
Standalone without any problems. Re-tested dynamic allocation specifically.
"Lost executor" messages are still an annoyance since they're expected to
occur with dynamic allocation, and shouldn't WARN/ERROR as they do now,
Just a heads up that this RC1 release is still appearing as 1.5.0-SNAPSHOT
(Not just me right..?)
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-0-RC1-tp13780p13792.html
Sent from the Apache Spark Developers List mailing
Turns out it was a mix of user-error as well as a bug in the sbt/sbt build
that has since been fixed in the current 1.5 branch (I built from this
commit: b4f4e91c395cb69ced61d9ff1492d1b814f96828)
I've been testing out the dynamic allocation specifically and it's looking
pretty solid! Haven't come
Has anyone had success using this preview? We were able to build the preview,
and able to start the spark-master, however, unable to connect any spark
workers to it.
Kept receiving AkkaRpcEnv address in use while attempting to connect the
spark-worker to the master. Also confirmed that the
We tested this out on our dev cluster (Hadoop 2.7.1 + Spark 1.4.0), and it
looks great! I might also be interested in contributing to it when I get a
chance! Keep up the awesome work! :)
Mark.
--
View this message in context:
Hello,
I was interested in creating a StreamingContext textFileStream based job,
which runs for long durations, and can also recover from prolonged driver
failure... It seems like StreamingContext checkpointing is mainly used for
the case when the driver dies during the processing of an RDD, and
Hi Jerry,
Thanks for the quick response! Looks like I'll need to come up with an
alternative solution in the meantime, since I'd like to avoid the other
input streams + WAL approach. :)
Thanks again,
Mark.
--
View this message in context:
I've noticed a couple oddities with the pyspark.daemons which are causing us
a bit of memory problems within some of our heavy spark jobs, especially
when they run at the same time...
It seems that there is typically a 1-to-1 ratio of pyspark.daemons to cores
per executor during aggregations. By
15 matches
Mail list logo