OK, I see the confusions in terminologies. However, what were suggested
should still work. A Luigi worker in this case would function like a
Spark client, responsible for submitting a Spark application (or job in
Luigi's term). In other words, you just define all necessary jars for
all your
I don't know why I don't see my last message in the thread here:
https://lists.apache.org/thread/5wgdqp746nj4f6ovdl42rt82wc8ltkcn
Also don't get messages from Artemis in my mail, I can only see them in the
thread web UI, which is very confusing.
On top of that when I click on "reply via your own em
I guess there are several misconceptions here:
1. Worker doesn't create driver, client does.
2. Regardless of job scheduling, all jobs of the same task/application
are under the same SparkContext which is created by the driver.
Therefore, you need to specify ALL dependency jars for ALL job
Wouldn't these be separately submitted jobs for separate workloads? You can
of course dynamically change each job submitted to have whatever packages
you like, from whatever is orchestrating. A single job doing everything
sound right.
On Thu, Mar 10, 2022, 12:05 PM Rafał Wojdyła wrote:
> Because
Because I can't (and should not) know ahead of time which jobs will be
executed, that's the job of the orchestration layer (and can be dynamic). I
know I can specify multiple packages. Also not worried about memory.
On Thu, 10 Mar 2022 at 13:54, Artemis User wrote:
> If changing packages or jars
If changing packages or jars isn't your concern, why not just specify
ALL packages that you would need for the Spark environment? You know you
can define multiple packages under the packages option. This shouldn't
cause memory issues since JVM uses dynamic class loading...
On 3/9/22 10:03 PM,
Hi Artemis,
Thanks for your input, to answer your questions:
> You may want to ask yourself why it is necessary to change the jar
packages during runtime.
I have a long running orchestrator process, which executes multiple spark
jobs, currently on a single VM/driver, some of those jobs might
requ
This is indeed a JVM issue, not a Spark issue. You may want to ask
yourself why it is necessary to change the jar packages during runtime.
Changing package doesn't mean to reload the classes. There is no way to
reload the same class unless you customize the classloader of Spark. I
also don't
Sean,
I understand you might be sceptical about adding this functionality into
(py)spark, I'm curious:
* would error/warning on update in configuration that is currently
effectively impossible (requires restart of JVM) be reasonable?
* what do you think about the workaround in the issue?
Cheers - R
Unfortunately this opens a lot more questions and problems than it solves.
What if you take something off the classpath, for example? change a class?
On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła wrote:
> Thanks Sean,
> To be clear, if you prefer to change the label on this issue from bug to
> st
Thanks Sean,
To be clear, if you prefer to change the label on this issue from bug to
sth else, feel free to do so, no strong opinions on my end. What happens to
the classpath, whether spark uses some classloader magic, is probably an
implementation detail. That said, it's definitely not intuitive
That isn't a bug - you can't change the classpath once the JVM is executing.
On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła wrote:
> Hi,
> My use case is that, I have a long running process (orchestrator) with
> multiple tasks, some tasks might require extra spark dependencies. It seems
> once the
Hi,
My use case is that, I have a long running process (orchestrator) with
multiple tasks, some tasks might require extra spark dependencies. It seems
once the spark context is started it's not possible to update
`spark.jars.packages`? I have reported an issue at
https://issues.apache.org/jira/brow
13 matches
Mail list logo