Re: External Spark shuffle service for k8s

Arun Ravi Wed, 10 Apr 2024 19:42:45 -0700

Hi Everyone,

I had to explored IBM's and AWS's S3 shuffle plugins (some time back), I
had also explored AWS FSX lustre in few of my production jobs which has
~20TB of shuffle operations with 200-300 executors. What I have observed is
S3 and fax behaviour was fine during the write phase, however I faced iops
throttling during the read phase(read taking forever to complete). I think
this might be contributed by the heavy use of shuffle index file (I didn't
perform any extensive research on this), so I believe the shuffle manager
logic have to be intelligent enough to reduce the fetching of files from
object store. In the end for my usecase I started using pvcs and pvc aware
scheduling along with decommissioning. So far performance is good with this
choice.


Thank you

On Tue, 9 Apr 2024, 15:17 Mich Talebzadeh, <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> First thanks everyone for their contributions
>
> I was going to reply to @Enrico Minack <i...@enrico.minack.dev>  but
> noticed additional info. As I understand for example,  Apache Uniffle is an
> incubating project aimed at providing a pluggable shuffle service for
> Spark. So basically, all these "external shuffle services" have in common
> is to offload shuffle data management to external services, thus reducing
> the memory and CPU overhead on Spark executors. That is great.  While
> Uniffle and others enhance shuffle performance and scalability, it would be
> great to integrate them with Spark UI. This may require additional
> development efforts. I suppose  the interest would be to have these
> external matrices incorporated into Spark with one look and feel. This may
> require customizing the UI to fetch and display metrics or statistics from
> the external shuffle services. Has any project done this?
>
> Thanks
>
> Mich Talebzadeh,
> Technologist | Solutions Architect | Data Engineer  | Generative AI
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Mon, 8 Apr 2024 at 14:19, Vakaris Baškirov <vakaris.bashki...@gmail.com>
> wrote:
>
>> I see that both Uniffle and Celebron support S3/HDFS backends which is
>> great.
>> In the case someone is using S3/HDFS, I wonder what would be the
>> advantages of using Celebron or Uniffle vs IBM shuffle service plugin
>> <https://github.com/IBM/spark-s3-shuffle> or Cloud Shuffle Storage
>> Plugin from AWS
>> <https://docs.aws.amazon.com/glue/latest/dg/cloud-shuffle-storage-plugin.html>
>> ?
>>
>> These plugins do not require deploying a separate service. Are there any
>> advantages to using Uniffle/Celebron in the case of using S3 backend, which
>> would require deploying a separate service?
>>
>> Thanks
>> Vakaris
>>
>> On Mon, Apr 8, 2024 at 10:03 AM roryqi <ror...@apache.org> wrote:
>>
>>> Apache Uniffle (incubating) may be another solution.
>>> You can see
>>> https://github.com/apache/incubator-uniffle
>>>
>>> https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era
>>>
>>> Mich Talebzadeh <mich.talebza...@gmail.com> 于2024年4月8日周一 07:15写道：
>>>
>>>> Splendid
>>>>
>>>> The configurations below can be used with k8s deployments of Spark.
>>>> Spark applications running on k8s can utilize these configurations to
>>>> seamlessly access data stored in Google Cloud Storage (GCS) and Amazon S3.
>>>>
>>>> For Google GCS we may have
>>>>
>>>> spark_config_gcs = {
>>>>     "spark.kubernetes.authenticate.driver.serviceAccountName":
>>>> "service_account_name",
>>>>     "spark.hadoop.fs.gs.impl":
>>>> "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem",
>>>>     "spark.hadoop.google.cloud.auth.service.account.enable": "true",
>>>>     "spark.hadoop.google.cloud.auth.service.account.json.keyfile":
>>>> "/path/to/keyfile.json",
>>>> }
>>>>
>>>> For Amazon S3 similar
>>>>
>>>> spark_config_s3 = {
>>>>     "spark.kubernetes.authenticate.driver.serviceAccountName":
>>>> "service_account_name",
>>>>     "spark.hadoop.fs.s3a.impl":
>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>>     "spark.hadoop.fs.s3a.access.key": "s3_access_key",
>>>>     "spark.hadoop.fs.s3a.secret.key": "secret_key",
>>>> }
>>>>
>>>>
>>>> To implement these configurations and enable Spark applications to
>>>> interact with GCS and S3, I guess we can approach it this way
>>>>
>>>> 1) Spark Repository Integration: These configurations need to be added
>>>> to the Spark repository as part of the supported configuration options for
>>>> k8s deployments.
>>>>
>>>> 2) Configuration Settings: Users need to specify these configurations
>>>> when submitting Spark applications to a Kubernetes cluster. They can
>>>> include these configurations in the Spark application code or pass them as
>>>> command-line arguments or environment variables during application
>>>> submission.
>>>>
>>>> HTH
>>>>
>>>> Mich Talebzadeh,
>>>>
>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>> London
>>>> United Kingdom
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* The information provided is correct to the best of my
>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>> expert opinions (Werner
>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>
>>>>
>>>> On Sun, 7 Apr 2024 at 13:31, Vakaris Baškirov <
>>>> vakaris.bashki...@gmail.com> wrote:
>>>>
>>>>> There is an IBM shuffle service plugin that supports S3
>>>>> https://github.com/IBM/spark-s3-shuffle
>>>>>
>>>>> Though I would think a feature like this could be a part of the main
>>>>> Spark repo. Trino already has out-of-box support for s3 exchange (shuffle)
>>>>> and it's very useful.
>>>>>
>>>>> Vakaris
>>>>>
>>>>> On Sun, Apr 7, 2024 at 12:27 PM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Thanks for your suggestion that I take it as a workaround. Whilst
>>>>>> this workaround can potentially address storage allocation issues, I was
>>>>>> more interested in exploring solutions that offer a more seamless
>>>>>> integration with large distributed file systems like HDFS, GCS, or S3. 
>>>>>> This
>>>>>> would ensure better performance and scalability for handling larger
>>>>>> datasets efficiently.
>>>>>>
>>>>>>
>>>>>> Mich Talebzadeh,
>>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>>> London
>>>>>> United Kingdom
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>> expert opinions (Werner
>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>
>>>>>>
>>>>>> On Sat, 6 Apr 2024 at 21:28, Bjørn Jørgensen <
>>>>>> bjornjorgen...@gmail.com> wrote:
>>>>>>
>>>>>>> You can make a PVC on K8S call it 300GB
>>>>>>>
>>>>>>> make a folder in yours dockerfile
>>>>>>> WORKDIR /opt/spark/work-dir
>>>>>>> RUN chmod g+w /opt/spark/work-dir
>>>>>>>
>>>>>>> start spark with adding this
>>>>>>>
>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName",
>>>>>>> "300gb") \
>>>>>>>
>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.path",
>>>>>>> "/opt/spark/work-dir") \
>>>>>>>
>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.readOnly",
>>>>>>> "False") \
>>>>>>>
>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.options.claimName",
>>>>>>> "300gb") \
>>>>>>>
>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.path",
>>>>>>> "/opt/spark/work-dir") \
>>>>>>>
>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.readOnly",
>>>>>>> "False") \
>>>>>>>   .config("spark.local.dir", "/opt/spark/work-dir")
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> lør. 6. apr. 2024 kl. 15:45 skrev Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>
>>>>>>>> I have seen some older references for shuffle service for k8s,
>>>>>>>> although it is not clear they are talking about a generic shuffle
>>>>>>>> service for k8s.
>>>>>>>>
>>>>>>>> Anyhow with the advent of genai and the need to allow for a larger
>>>>>>>> volume of data, I was wondering if there has been any more work on
>>>>>>>> this matter. Specifically larger and scalable file systems like
>>>>>>>> HDFS,
>>>>>>>> GCS , S3 etc, offer significantly larger storage capacity than local
>>>>>>>> disks on individual worker nodes in a k8s cluster, thus allowing
>>>>>>>> handling much larger datasets more efficiently. Also the degree of
>>>>>>>> parallelism and fault tolerance  with these files systems come into
>>>>>>>> it. I will be interested in hearing more about any progress on this.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> .
>>>>>>>>
>>>>>>>> Mich Talebzadeh,
>>>>>>>>
>>>>>>>> Technologist | Solutions Architect | Data Engineer  | Generative AI
>>>>>>>>
>>>>>>>> London
>>>>>>>> United Kingdom
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Disclaimer: The information provided is correct to the best of my
>>>>>>>> knowledge but of course cannot be guaranteed . It is essential to
>>>>>>>> note
>>>>>>>> that, as with any advice, quote "one test result is worth
>>>>>>>> one-thousand
>>>>>>>> expert opinions (Werner Von Braun)".
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Bjørn Jørgensen
>>>>>>> Vestre Aspehaug 4, 6010 Ålesund
>>>>>>> Norge
>>>>>>>
>>>>>>> +47 480 94 297
>>>>>>>
>>>>>>

Re: External Spark shuffle service for k8s

Reply via email to