Re: Regarding Spark on Kubernetes(EKS)

Mich Talebzadeh Mon, 19 Feb 2024 07:23:14 -0800

Sure but first it would be beneficial to understand the way Spark works on
Kubernetes and the concept.s


Have a look at this article of mine

Spark on Kubernetes, A Practitioner’s Guide
<https://www.linkedin.com/pulse/spark-kubernetes-practitioners-guide-mich-talebzadeh-ph-d-%3FtrackingId=Wsu3lkoPaCWqGemYHe8%252BLQ%253D%253D/?trackingId=Wsu3lkoPaCWqGemYHe8%2BLQ%3D%3D>

HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Mon, 19 Feb 2024 at 15:09, Jagannath Majhi <
[email protected]> wrote:

> Yes
>
> On Mon, Feb 19, 2024, 8:35 PM Mich Talebzadeh <[email protected]>
> wrote:
>
>> OK you have a jar file that you want to work with when running using
>> Spark on k8s as the execution engine (EKS) as opposed to  YARN on EMR as
>> the execution engine?
>>
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Mon, 19 Feb 2024 at 14:38, Jagannath Majhi <
>> [email protected]> wrote:
>>
>>> I am not using any private docker image. Only I am running the jar file
>>> in EMR using spark-submit command so now I want to run this jar file in eks
>>> so can you please tell me how can I set-up for this ??
>>>
>>> On Mon, Feb 19, 2024, 8:06 PM Jagannath Majhi <
>>> [email protected]> wrote:
>>>
>>>> Can we connect over Google meet??
>>>>
>>>> On Mon, Feb 19, 2024, 8:03 PM Mich Talebzadeh <
>>>> [email protected]> wrote:
>>>>
>>>>> Where is your docker file? In ECR container registry.
>>>>> If you are going to use EKS, then it need to be accessible to all
>>>>> nodes of cluster
>>>>>
>>>>> When you build your docker image, put your jar under the $SPARK_HOME
>>>>> directory. Then add a line to your docker build file as below
>>>>> Here I am accessing Google BigQuery DW from EKS
>>>>> # Add a BigQuery connector jar.
>>>>> ENV SPARK_EXTRA_JARS_DIR=/opt/spark/jars/
>>>>> ENV SPARK_EXTRA_CLASSPATH='/opt/spark/jars/*'
>>>>> RUN mkdir -p "${SPARK_EXTRA_JARS_DIR}" \
>>>>>     && chown spark:spark "${SPARK_EXTRA_JARS_DIR}"
>>>>> COPY --chown=spark:spark \
>>>>>     spark-bigquery-with-dependencies_2.12-0.22.2.jar
>>>>> "${SPARK_EXTRA_JARS_DIR}"
>>>>>
>>>>> Here I am accessing Google BigQuery DW from EKS cluster
>>>>>
>>>>> HTH
>>>>>
>>>>> Mich Talebzadeh,
>>>>> Dad | Technologist | Solutions Architect | Engineer
>>>>> London
>>>>> United Kingdom
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>> expert opinions (Werner
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>
>>>>>
>>>>> On Mon, 19 Feb 2024 at 13:42, Jagannath Majhi <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Dear Spark Community,
>>>>>>
>>>>>> I hope this email finds you well. I am reaching out to seek
>>>>>> assistance and guidance regarding a task I'm currently working on 
>>>>>> involving
>>>>>> Apache Spark.
>>>>>>
>>>>>> I have developed a JAR file that contains some Spark applications and
>>>>>> functionality, and I need to run this JAR file within a Spark cluster.
>>>>>> However, the JAR file is located in an AWS S3 bucket. I'm facing some
>>>>>> challenges in configuring Spark to access and execute this JAR file
>>>>>> directly from the S3 bucket.
>>>>>>
>>>>>> I would greatly appreciate any advice, best practices, or pointers on
>>>>>> how to achieve this integration effectively. Specifically, I'm looking 
>>>>>> for
>>>>>> insights on:
>>>>>>
>>>>>>    1. Configuring Spark to access and retrieve the JAR file from an
>>>>>>    AWS S3 bucket.
>>>>>>    2. Setting up the necessary permissions and authentication
>>>>>>    mechanisms to ensure seamless access to the S3 bucket.
>>>>>>    3. Any potential performance considerations or optimizations when
>>>>>>    running Spark applications with dependencies stored in remote storage 
>>>>>> like
>>>>>>    AWS S3.
>>>>>>
>>>>>> If anyone in the community has prior experience or knowledge in this
>>>>>> area, I would be extremely grateful for your guidance. Additionally, if
>>>>>> there are any relevant resources, documentation, or tutorials that you
>>>>>> could recommend, it would be incredibly helpful.
>>>>>>
>>>>>> Thank you very much for considering my request. I look forward to
>>>>>> hearing from you and benefiting from the collective expertise of the 
>>>>>> Spark
>>>>>> community.
>>>>>>
>>>>>> Best regards, Jagannath Majhi
>>>>>>
>>>>>

Re: Regarding Spark on Kubernetes(EKS)

Reply via email to