As part of PR https://github.com/apache/spark/pull/11723, I have added a
killAllTasks function that can be used to kill (rather interrupt)
individual tasks before an executor exits. If this PR is accepted, for
doing task level cleanups, we can add a call to this function before
executor exits. The
On Wed, Apr 6, 2016 at 4:39 PM, Sung Hwan Chung
wrote:
> My option so far seems to be using JVM's shutdown hook, but I was
> wondering if Spark itself had an API for tasks.
>
Spark would be using that under the hood anyway, so you might as well just
use the jvm shutdown hook directly.
What I meant is 'application'. I.e., when we manually terminate an
application that was submitted via spark-submit.
When we manually kill an application, it seems that individual tasks do not
receive the interruptException.
That interruptException seems to work iff we cancel the job through
sc.can
+1 for renaming the jar file.
Sincerely,
DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Tue, Apr 5, 2016 at 8:02 PM, Chris Fregly wrote:
> perhaps renaming to Spark ML would actually clear up code and documentation
> con
Sure, I'll take a look. Planning to do full verification in a bit.
On Wed, Apr 6, 2016 at 12:54 PM Ted Yu wrote:
> Josh:
> Can you check spark-1.6.1-bin-hadoop2.4.tgz ?
>
> $ tar zxf spark-1.6.1-bin-hadoop2.4.tgz
>
> gzip: stdin: not in gzip format
> tar: Child returned status 1
> tar: Error is
Josh:
Can you check spark-1.6.1-bin-hadoop2.4.tgz ?
$ tar zxf spark-1.6.1-bin-hadoop2.4.tgz
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
$ ls -l !$
ls -l spark-1.6.1-bin-hadoop2.4.tgz
-rw-r--r--. 1 hbase hadoop 323614720 Apr 5 19:25
spa
Why would the Executors shutdown when the Job is terminated? Executors are
bound to Applications, not Jobs. Furthermore,
unless spark.job.interruptOnCancel is set to true, canceling the Job at the
Application and DAGScheduler level won't actually interrupt the Tasks
running on the Executors. If
Thank you Josh! I confirmed that the Spark 1.6.1 / Hadoop 2.6 package on S3
is now working, and the SHA512 checks out.
On Wed, Apr 6, 2016 at 3:19 PM Josh Rosen wrote:
> I downloaded the Spark 1.6.1 artifacts from the Apache mirror network and
> re-uploaded them to the spark-related-packages S3
Hi,
I'm looking for ways to add shutdown hooks to executors : i.e., when a Job
is forcefully terminated before it finishes.
The scenario goes likes this : executors are running a long running job
within a 'map' function. The user decides to terminate the job, then the
mappers should perform some
I downloaded the Spark 1.6.1 artifacts from the Apache mirror network and
re-uploaded them to the spark-related-packages S3 bucket, so hopefully
these packages should be fixed now.
On Mon, Apr 4, 2016 at 3:37 PM Nicholas Chammas
wrote:
> Thanks, that was the command. :thumbsup:
>
> On Mon, Apr 4
I agree with your general logic and understanding of semver. That is why
if we are going to violate the strictures of semver, I'd only be happy
doing so if support for Java 7 and/or Scala 2.10 were clearly understood to
be deprecated already in the 2.0.0 release -- i.e. from the outset not to
be u
>
> are you sure that library is being properly maintained?
>
Almost by definition the answer to that is "No; a library that hasn't been
upgraded to Scala 2.11 is not being properly maintained." That means that
a user of such a library is already facing the choice of whether to take on
the mainte
In general, I agree - it is preferable to break backward compatibility
(where unavoidable) only at major versions.
Unfortunately, this usually is planned better - with earlier versions
announcing intent of the change - deprecation across multiple
releases, defaults changed, etc.
>From the thread,
Answering for myself: I assume everyone is following
http://semver.org/ semantic versioning. If not, would be good to hear
an alternative theory.
For semver, strictly speaking, minor releases should be
backwards-compatible for callers. Are things like stopping support for
Java 8 or Scala 2.10 back
A few other reasons to drop 2.10 support sooner rather than later.
- We at Lightbend are evaluating some fundamental changes to the REPL to
make it work better for large heaps, especially for Spark. There are other
recent and planned enhancements. This work will be benefit notebook users,
Spark Summit 2016 (www.spark-summit.org/2016) will be held from June 6-8 at
the Union Square Hilton in San Francisco, and the recently released agenda
features a stellar lineup of community talks led by top engineers,
architects, data scientists, researchers, entrepreneurs and analysts from
UC Berk
Ah I got it - Seq[(Int, Float)] is actually represented as Seq[Row] (seq of
struct type) internally.
So a further extraction is required, e.g. row => row.getSeq[Row](1).map { r
=> r.getInt(0) }
On Wed, 6 Apr 2016 at 13:35 Nick Pentreath wrote:
> Hi there,
>
> In writing some tests for a PR I'm
Hello All (and Devs in particular),
Thank you again for your further responses. Please find a detailed
email below which identifies the cause (I believe) of the partition
imbalance problem, which occurs in spark 1.5, 1.6, and a 2.0-SNAPSHOT.
This is followed by follow-up questions for the dev comm
Hello Team
The below is a very good book on Big Data for interview preparation.
https://notionpress.com/read/big-data-interview-faqs
Thanks,
Chaturvedi.
Hi there,
In writing some tests for a PR I'm working on, with a more complex array
type in a DF, I ran into this issue (running off latest master).
Any thoughts?
*// create DF with a column of Array[(Int, Double)]*
val df = sc.parallelize(Seq(
(0, Array((1, 6.0), (1, 4.0))),
(1, Array((1, 3.0),
20 matches
Mail list logo