Re: Regarding Spark on Kubernetes(EKS)

Jagannath Majhi Mon, 19 Feb 2024 07:26:13 -0800

I am not using any private docker image. Only I am running the jar file in
EMR using spark-submit command so now I want to run this jar file in eks so
can you please tell me how can I set-up for this ??


On Mon, Feb 19, 2024, 8:06 PM Jagannath Majhi <
jagannath.ma...@cloud.cbnits.com> wrote:

> Can we connect over Google meet??
>
> On Mon, Feb 19, 2024, 8:03 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Where is your docker file? In ECR container registry.
>> If you are going to use EKS, then it need to be accessible to all nodes
>> of cluster
>>
>> When you build your docker image, put your jar under the $SPARK_HOME
>> directory. Then add a line to your docker build file as below
>> Here I am accessing Google BigQuery DW from EKS
>> # Add a BigQuery connector jar.
>> ENV SPARK_EXTRA_JARS_DIR=/opt/spark/jars/
>> ENV SPARK_EXTRA_CLASSPATH='/opt/spark/jars/*'
>> RUN mkdir -p "${SPARK_EXTRA_JARS_DIR}" \
>>     && chown spark:spark "${SPARK_EXTRA_JARS_DIR}"
>> COPY --chown=spark:spark \
>>     spark-bigquery-with-dependencies_2.12-0.22.2.jar
>> "${SPARK_EXTRA_JARS_DIR}"
>>
>> Here I am accessing Google BigQuery DW from EKS cluster
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Mon, 19 Feb 2024 at 13:42, Jagannath Majhi <
>> jagannath.ma...@cloud.cbnits.com> wrote:
>>
>>> Dear Spark Community,
>>>
>>> I hope this email finds you well. I am reaching out to seek assistance
>>> and guidance regarding a task I'm currently working on involving Apache
>>> Spark.
>>>
>>> I have developed a JAR file that contains some Spark applications and
>>> functionality, and I need to run this JAR file within a Spark cluster.
>>> However, the JAR file is located in an AWS S3 bucket. I'm facing some
>>> challenges in configuring Spark to access and execute this JAR file
>>> directly from the S3 bucket.
>>>
>>> I would greatly appreciate any advice, best practices, or pointers on
>>> how to achieve this integration effectively. Specifically, I'm looking for
>>> insights on:
>>>
>>>    1. Configuring Spark to access and retrieve the JAR file from an AWS
>>>    S3 bucket.
>>>    2. Setting up the necessary permissions and authentication
>>>    mechanisms to ensure seamless access to the S3 bucket.
>>>    3. Any potential performance considerations or optimizations when
>>>    running Spark applications with dependencies stored in remote storage 
>>> like
>>>    AWS S3.
>>>
>>> If anyone in the community has prior experience or knowledge in this
>>> area, I would be extremely grateful for your guidance. Additionally, if
>>> there are any relevant resources, documentation, or tutorials that you
>>> could recommend, it would be incredibly helpful.
>>>
>>> Thank you very much for considering my request. I look forward to
>>> hearing from you and benefiting from the collective expertise of the Spark
>>> community.
>>>
>>> Best regards, Jagannath Majhi
>>>
>>

Re: Regarding Spark on Kubernetes(EKS)

Reply via email to