Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-11 Thread Artemis User
OK, I see the confusions in terminologies.  However, what were suggested should still work.  A Luigi worker in this case would function like a Spark client, responsible for submitting a Spark application (or job in Luigi's term).  In other words, you just define all necessary jars for all your

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-11 Thread Rafał Wojdyła
I don't know why I don't see my last message in the thread here: https://lists.apache.org/thread/5wgdqp746nj4f6ovdl42rt82wc8ltkcn Also don't get messages from Artemis in my mail, I can only see them in the thread web UI, which is very confusing. On top of that when I click on "reply via your own em

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Artemis User
I guess there are several misconceptions here: 1. Worker doesn't create driver, client does. 2. Regardless of job scheduling, all jobs of the same task/application are under the same SparkContext which is created by the driver.  Therefore, you need to specify ALL dependency jars for ALL job

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Sean Owen
Wouldn't these be separately submitted jobs for separate workloads? You can of course dynamically change each job submitted to have whatever packages you like, from whatever is orchestrating. A single job doing everything sound right. On Thu, Mar 10, 2022, 12:05 PM Rafał Wojdyła wrote: > Because

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Rafał Wojdyła
Because I can't (and should not) know ahead of time which jobs will be executed, that's the job of the orchestration layer (and can be dynamic). I know I can specify multiple packages. Also not worried about memory. On Thu, 10 Mar 2022 at 13:54, Artemis User wrote: > If changing packages or jars

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-10 Thread Artemis User
If changing packages or jars isn't your concern, why not just specify ALL packages that you would need for the Spark environment? You know you can define multiple packages under the packages option.  This shouldn't cause memory issues since JVM uses dynamic class loading... On 3/9/22 10:03 PM,

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Hi Artemis, Thanks for your input, to answer your questions: > You may want to ask yourself why it is necessary to change the jar packages during runtime. I have a long running orchestrator process, which executes multiple spark jobs, currently on a single VM/driver, some of those jobs might requ

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Artemis User
This is indeed a JVM issue, not a Spark issue.  You may want to ask yourself why it is necessary to change the jar packages during runtime.  Changing package doesn't mean to reload the classes. There is no way to reload the same class unless you customize the classloader of Spark.  I also don't

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Sean, I understand you might be sceptical about adding this functionality into (py)spark, I'm curious: * would error/warning on update in configuration that is currently effectively impossible (requires restart of JVM) be reasonable? * what do you think about the workaround in the issue? Cheers - R

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Sean Owen
Unfortunately this opens a lot more questions and problems than it solves. What if you take something off the classpath, for example? change a class? On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła wrote: > Thanks Sean, > To be clear, if you prefer to change the label on this issue from bug to > st

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Thanks Sean, To be clear, if you prefer to change the label on this issue from bug to sth else, feel free to do so, no strong opinions on my end. What happens to the classpath, whether spark uses some classloader magic, is probably an implementation detail. That said, it's definitely not intuitive

Re: [SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Sean Owen
That isn't a bug - you can't change the classpath once the JVM is executing. On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła wrote: > Hi, > My use case is that, I have a long running process (orchestrator) with > multiple tasks, some tasks might require extra spark dependencies. It seems > once the

[SPARK-38438] pyspark - how to update spark.jars.packages on existing default context?

2022-03-09 Thread Rafał Wojdyła
Hi, My use case is that, I have a long running process (orchestrator) with multiple tasks, some tasks might require extra spark dependencies. It seems once the spark context is started it's not possible to update `spark.jars.packages`? I have reported an issue at https://issues.apache.org/jira/brow