If changing packages or jars isn't your concern, why not just specify ALL packages that you would need for the Spark environment? You know you can define multiple packages under the packages option.  This shouldn't cause memory issues since JVM uses dynamic class loading...

On 3/9/22 10:03 PM, Rafał Wojdyła wrote:
Hi Artemis,
Thanks for your input, to answer your questions:

> You may want to ask yourself why it is necessary to change the jar packages during runtime.

I have a long running orchestrator process, which executes multiple spark jobs, currently on a single VM/driver, some of those jobs might require extra packages/jars (please see example in the issue).

> Changing package doesn't mean to reload the classes.

AFAIU this is unrelated

> There is no way to reload the same class unless you customize the classloader of Spark.

AFAIU this is an implementation detail.

> I also don't think it is necessary to implement a warning or error message when changing the configuration since it doesn't do any harm

To reiterate right now the API allows to change configuration of the context, without that configuration taking effect. See example of confused users here:  * https://stackoverflow.com/questions/41886346/spark-2-1-0-session-config-settings-pyspark  * https://stackoverflow.com/questions/53606756/how-to-set-spark-driver-memory-in-client-mode-pyspark-version-2-3-1

I'm curious if you have any opinion about the "hard-reset" workaround, copy-pasting from the issue:

```
s: SparkSession = ...

# Hard reset:
s.stop()
s._sc._gateway.shutdown()
s._sc._gateway.proc.stdin.close()
SparkContext._gateway = None
SparkContext._jvm = None
```

Cheers - Rafal

On 2022/03/09 15:39:58 Artemis User wrote:
> This is indeed a JVM issue, not a Spark issue.  You may want to ask
> yourself why it is necessary to change the jar packages during runtime.
> Changing package doesn't mean to reload the classes. There is no way to
> reload the same class unless you customize the classloader of Spark.  I
> also don't think it is necessary to implement a warning or error message
> when changing the configuration since it doesn't do any harm.  Spark
> uses lazy binding so you can do a lot of such "unharmful" things.
> Developers will have to understand the behaviors of each API before when
> using them..
>
>
> On 3/9/22 9:31 AM, Rafał Wojdyła wrote:
> >  Sean,
> > I understand you might be sceptical about adding this functionality
> > into (py)spark, I'm curious:
> > * would error/warning on update in configuration that is currently
> > effectively impossible (requires restart of JVM) be reasonable?
> > * what do you think about the workaround in the issue?
> > Cheers - Rafal
> >
> > On Wed, 9 Mar 2022 at 14:24, Sean Owen <sr...@gmail.com> wrote:
> >
> >     Unfortunately this opens a lot more questions and problems than it
> >     solves. What if you take something off the classpath, for example?
> >     change a class?
> >
> >     On Wed, Mar 9, 2022 at 8:22 AM Rafał Wojdyła
> >     <ra...@gmail.com> wrote:
> >
> >         Thanks Sean,
> >         To be clear, if you prefer to change the label on this issue
> >         from bug to sth else, feel free to do so, no strong opinions
> >         on my end. What happens to the classpath, whether spark uses
> >         some classloader magic, is probably an implementation detail.
> >         That said, it's definitely not intuitive that you can change
> >         the configuration and get the context (with the updated
> >         config) without any warnings/errors. Also what would you
> >         recommend as a workaround or solution to this problem? Any
> >         comments about the workaround in the issue? Keep in mind that
> >         I can't restart the long running orchestration process (python
> >         process if that matters).
> >         Cheers - Rafal
> >
> >         On Wed, 9 Mar 2022 at 13:15, Sean Owen <sr...@gmail.com> wrote:
> >
> >             That isn't a bug - you can't change the classpath once the
> >             JVM is executing.
> >
> >             On Wed, Mar 9, 2022 at 7:11 AM Rafał Wojdyła
> >             <ra...@gmail.com> wrote:
> >
> >                 Hi,
> >                 My use case is that, I have a long running process
> >                 (orchestrator) with multiple tasks, some tasks might
> >                 require extra spark dependencies. It seems once the
> >                 spark context is started it's not possible to update
> >                 `spark.jars.packages`? I have reported an issue at
> > https://issues.apache.org/jira/browse/SPARK-38438,
> >                 together with a workaround ("hard reset of the
> >                 cluster"). I wonder if anyone has a solution for this?
> >                 Cheers - Rafal
> >
>

Reply via email to