Re: External Spark shuffle service for k8s

Bjørn Jørgensen Thu, 11 Apr 2024 01:30:08 -0700

" In the end for my usecase I started using pvcs and pvc aware scheduling
along with decommissioning. So far performance is good with this choice."
How did you do this?



tor. 11. apr. 2024 kl. 04:13 skrev Arun Ravi <arunrav...@gmail.com>:

> Hi Everyone,
>
> I had to explored IBM's and AWS's S3 shuffle plugins (some time back), I
> had also explored AWS FSX lustre in few of my production jobs which has
> ~20TB of shuffle operations with 200-300 executors. What I have observed is
> S3 and fax behaviour was fine during the write phase, however I faced iops
> throttling during the read phase(read taking forever to complete). I think
> this might be contributed by the heavy use of shuffle index file (I didn't
> perform any extensive research on this), so I believe the shuffle manager
> logic have to be intelligent enough to reduce the fetching of files from
> object store. In the end for my usecase I started using pvcs and pvc aware
> scheduling along with decommissioning. So far performance is good with this
> choice.
>
> Thank you
>
> On Tue, 9 Apr 2024, 15:17 Mich Talebzadeh, <mich.talebza...@gmail.com>
> wrote:
>
>> Hi,
>>
>> First thanks everyone for their contributions
>>
>> I was going to reply to @Enrico Minack <i...@enrico.minack.dev>  but
>> noticed additional info. As I understand for example,  Apache Uniffle is an
>> incubating project aimed at providing a pluggable shuffle service for
>> Spark. So basically, all these "external shuffle services" have in common
>> is to offload shuffle data management to external services, thus reducing
>> the memory and CPU overhead on Spark executors. That is great.  While
>> Uniffle and others enhance shuffle performance and scalability, it would be
>> great to integrate them with Spark UI. This may require additional
>> development efforts. I suppose  the interest would be to have these
>> external matrices incorporated into Spark with one look and feel. This may
>> require customizing the UI to fetch and display metrics or statistics from
>> the external shuffle services. Has any project done this?
>>
>> Thanks
>>
>> Mich Talebzadeh,
>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Mon, 8 Apr 2024 at 14:19, Vakaris Baškirov <
>> vakaris.bashki...@gmail.com> wrote:
>>
>>> I see that both Uniffle and Celebron support S3/HDFS backends which is
>>> great.
>>> In the case someone is using S3/HDFS, I wonder what would be the
>>> advantages of using Celebron or Uniffle vs IBM shuffle service plugin
>>> <https://github.com/IBM/spark-s3-shuffle> or Cloud Shuffle Storage
>>> Plugin from AWS
>>> <https://docs.aws.amazon.com/glue/latest/dg/cloud-shuffle-storage-plugin.html>
>>> ?
>>>
>>> These plugins do not require deploying a separate service. Are there any
>>> advantages to using Uniffle/Celebron in the case of using S3 backend, which
>>> would require deploying a separate service?
>>>
>>> Thanks
>>> Vakaris
>>>
>>> On Mon, Apr 8, 2024 at 10:03 AM roryqi <ror...@apache.org> wrote:
>>>
>>>> Apache Uniffle (incubating) may be another solution.
>>>> You can see
>>>> https://github.com/apache/incubator-uniffle
>>>>
>>>> https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era
>>>>
>>>> Mich Talebzadeh <mich.talebza...@gmail.com> 于2024年4月8日周一 07:15写道：
>>>>
>>>>> Splendid
>>>>>
>>>>> The configurations below can be used with k8s deployments of Spark.
>>>>> Spark applications running on k8s can utilize these configurations to
>>>>> seamlessly access data stored in Google Cloud Storage (GCS) and Amazon S3.
>>>>>
>>>>> For Google GCS we may have
>>>>>
>>>>> spark_config_gcs = {
>>>>>     "spark.kubernetes.authenticate.driver.serviceAccountName":
>>>>> "service_account_name",
>>>>>     "spark.hadoop.fs.gs.impl":
>>>>> "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem",
>>>>>     "spark.hadoop.google.cloud.auth.service.account.enable": "true",
>>>>>     "spark.hadoop.google.cloud.auth.service.account.json.keyfile":
>>>>> "/path/to/keyfile.json",
>>>>> }
>>>>>
>>>>> For Amazon S3 similar
>>>>>
>>>>> spark_config_s3 = {
>>>>>     "spark.kubernetes.authenticate.driver.serviceAccountName":
>>>>> "service_account_name",
>>>>>     "spark.hadoop.fs.s3a.impl":
>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>>>     "spark.hadoop.fs.s3a.access.key": "s3_access_key",
>>>>>     "spark.hadoop.fs.s3a.secret.key": "secret_key",
>>>>> }
>>>>>
>>>>>
>>>>> To implement these configurations and enable Spark applications to
>>>>> interact with GCS and S3, I guess we can approach it this way
>>>>>
>>>>> 1) Spark Repository Integration: These configurations need to be added
>>>>> to the Spark repository as part of the supported configuration options for
>>>>> k8s deployments.
>>>>>
>>>>> 2) Configuration Settings: Users need to specify these configurations
>>>>> when submitting Spark applications to a Kubernetes cluster. They can
>>>>> include these configurations in the Spark application code or pass them as
>>>>> command-line arguments or environment variables during application
>>>>> submission.
>>>>>
>>>>> HTH
>>>>>
>>>>> Mich Talebzadeh,
>>>>>
>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>> London
>>>>> United Kingdom
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>> expert opinions (Werner
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>
>>>>>
>>>>> On Sun, 7 Apr 2024 at 13:31, Vakaris Baškirov <
>>>>> vakaris.bashki...@gmail.com> wrote:
>>>>>
>>>>>> There is an IBM shuffle service plugin that supports S3
>>>>>> https://github.com/IBM/spark-s3-shuffle
>>>>>>
>>>>>> Though I would think a feature like this could be a part of the main
>>>>>> Spark repo. Trino already has out-of-box support for s3 exchange 
>>>>>> (shuffle)
>>>>>> and it's very useful.
>>>>>>
>>>>>> Vakaris
>>>>>>
>>>>>> On Sun, Apr 7, 2024 at 12:27 PM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Thanks for your suggestion that I take it as a workaround. Whilst
>>>>>>> this workaround can potentially address storage allocation issues, I was
>>>>>>> more interested in exploring solutions that offer a more seamless
>>>>>>> integration with large distributed file systems like HDFS, GCS, or S3. 
>>>>>>> This
>>>>>>> would ensure better performance and scalability for handling larger
>>>>>>> datasets efficiently.
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>>>> London
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>> expert opinions (Werner
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>
>>>>>>>
>>>>>>> On Sat, 6 Apr 2024 at 21:28, Bjørn Jørgensen <
>>>>>>> bjornjorgen...@gmail.com> wrote:
>>>>>>>
>>>>>>>> You can make a PVC on K8S call it 300GB
>>>>>>>>
>>>>>>>> make a folder in yours dockerfile
>>>>>>>> WORKDIR /opt/spark/work-dir
>>>>>>>> RUN chmod g+w /opt/spark/work-dir
>>>>>>>>
>>>>>>>> start spark with adding this
>>>>>>>>
>>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName",
>>>>>>>> "300gb") \
>>>>>>>>
>>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.path",
>>>>>>>> "/opt/spark/work-dir") \
>>>>>>>>
>>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.readOnly",
>>>>>>>> "False") \
>>>>>>>>
>>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.options.claimName",
>>>>>>>> "300gb") \
>>>>>>>>
>>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.path",
>>>>>>>> "/opt/spark/work-dir") \
>>>>>>>>
>>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.readOnly",
>>>>>>>> "False") \
>>>>>>>>   .config("spark.local.dir", "/opt/spark/work-dir")
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> lør. 6. apr. 2024 kl. 15:45 skrev Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>>
>>>>>>>>> I have seen some older references for shuffle service for k8s,
>>>>>>>>> although it is not clear they are talking about a generic shuffle
>>>>>>>>> service for k8s.
>>>>>>>>>
>>>>>>>>> Anyhow with the advent of genai and the need to allow for a larger
>>>>>>>>> volume of data, I was wondering if there has been any more work on
>>>>>>>>> this matter. Specifically larger and scalable file systems like
>>>>>>>>> HDFS,
>>>>>>>>> GCS , S3 etc, offer significantly larger storage capacity than
>>>>>>>>> local
>>>>>>>>> disks on individual worker nodes in a k8s cluster, thus allowing
>>>>>>>>> handling much larger datasets more efficiently. Also the degree of
>>>>>>>>> parallelism and fault tolerance  with these files systems come into
>>>>>>>>> it. I will be interested in hearing more about any progress on
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>
>>>>>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>>>>>>
>>>>>>>>> London
>>>>>>>>> United Kingdom
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Disclaimer: The information provided is correct to the best of my
>>>>>>>>> knowledge but of course cannot be guaranteed . It is essential to
>>>>>>>>> note
>>>>>>>>> that, as with any advice, quote "one test result is worth
>>>>>>>>> one-thousand
>>>>>>>>> expert opinions (Werner Von Braun)".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Bjørn Jørgensen
>>>>>>>> Vestre Aspehaug 4, 6010 Ålesund
>>>>>>>> Norge
>>>>>>>>
>>>>>>>> +47 480 94 297
>>>>>>>>
>>>>>>>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: External Spark shuffle service for k8s

Reply via email to