I am not sure about that. However, with Kubernetes and docker image for
PySpark, I build the packages into the image itself as below in the
dockerfile
RUN pip install pyyaml numpy cx_Oracle
and that will add those packages that you can reference in your py script
import yaml
import cx_Oracle
Can we add Python dependencies as we can do for mvn coordinates? So that we run
sth like pip install or download from pypi index?
From: Mich Talebzadeh
Sent: Mittwoch, 24. November 2021 18:28
Cc: user@spark.apache.org
Subject: Re: [issue] not able to add external libs to pyspark job while
The easiest way to set this up is to create dependencies.zip file.
Assuming that you have a virtual environment already set-up, where there is
directory called site-packages, go to that directory and just create a
minimal a shell script say package_and_zip_dependencies.sh to do it for you
Hello Owen,
Thank you for your prompt reply!
We will check it out.
best,
Atheer Alabdullatif
From: Sean Owen
Sent: Wednesday, November 24, 2021 5:06 PM
To: Atheer Alabdullatif
Cc: user@spark.apache.org ; Data Engineering
Subject: Re: [issue] not able to add
That's not how you add a library. From the docs:
https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html
On Wed, Nov 24, 2021 at 8:02 AM Atheer Alabdullatif
wrote:
> Dear Spark team,
> hope my email finds you well
>
>
> I am using pyspark 3.0 and facing an issue with