Re: Classpath isolation per SparkSession without Spark Connect

Pasha Finkelshtein Tue, 28 Nov 2023 12:14:30 -0800

I actually think it should be totally possible to use it on an executor
side. Maybe it will require a small extension/udf, but generally no issues
here. Pf4j is very lightweight, so you'll only have a small overhead for
classloaders.


There's still a small question of distribution of plugins/extensions, but
you probably already have a storage and can store them there.



[image: facebook] <https://fb.com/asm0dey>
[image: twitter] <https://twitter.com/asm0di0>
[image: linkedin] <https://linkedin.com/in/asm0dey>
[image: instagram] <https://instagram.com/asm0dey>

Pasha Finkelshteyn

Developer Advocate for Data Engineering

JetBrains



[email protected]
https://linktr.ee/asm0dey

Find out more <https://jetbrains.com>



On Tue, 28 Nov 2023 at 17:04, Faiz Halde <[email protected]> wrote:

> Hey Pasha,
>
> Is your suggestion towards the spark team? I can make use of the plugin
> system on the driver side of spark but considering spark is distributed,
> the executor side of spark needs to adapt to the pf4j framework I believe
> too
>
> Thanks
> Faiz
>
> On Tue, Nov 28, 2023, 16:57 Pasha Finkelshtein <
> [email protected]> wrote:
>
>> To me it seems like it's the best possible use case for PF4J.
>>
>>
>> [image: facebook] <https://fb.com/asm0dey>
>> [image: twitter] <https://twitter.com/asm0di0>
>> [image: linkedin] <https://linkedin.com/in/asm0dey>
>> [image: instagram] <https://instagram.com/asm0dey>
>>
>> Pasha Finkelshteyn
>>
>> Developer Advocate for Data Engineering
>>
>> JetBrains
>>
>>
>>
>> [email protected]
>> https://linktr.ee/asm0dey
>>
>> Find out more <https://jetbrains.com>
>>
>>
>>
>> On Tue, 28 Nov 2023 at 12:47, Holden Karau <[email protected]>
>> wrote:
>>
>>> So I don’t think we make any particular guarantees around class path
>>> isolation there, so even if it does work it’s something you’d need to pay
>>> attention to on upgrades. Class path isolation is tricky to get right.
>>>
>>> On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> We are using spark 3.5.0 and were wondering if the following is
>>>> achievable using spark-core
>>>>
>>>> Our use case involves spinning up a spark cluster where the driver
>>>> application loads user jars containing spark transformations at runtime. A
>>>> single spark application can load multiple user jars ( same cluster ) that
>>>> can have class path conflicts if care is not taken
>>>>
>>>> AFAIK, to get this right requires the Executor to be designed in a way
>>>> that allows for class path isolation ( UDF, lambda expressions ). Ideally
>>>> per Spark Session is what we want
>>>>
>>>> I know Spark connect has been designed this way but Spark connect is
>>>> not an option for us at the moment. I had some luck using a private method
>>>> inside spark called JobArtifactSet.withActiveJobArtifactState
>>>>
>>>> Is it sufficient for me to run the user code enclosed
>>>> within JobArtifactSet.withActiveJobArtifactState to achieve my requirement?
>>>>
>>>> Thank you
>>>>
>>>>
>>>> Faiz
>>>>
>>>

Re: Classpath isolation per SparkSession without Spark Connect

Reply via email to