Hi Everyone, I had to explored IBM's and AWS's S3 shuffle plugins (some time back), I had also explored AWS FSX lustre in few of my production jobs which has ~20TB of shuffle operations with 200-300 executors. What I have observed is S3 and fax behaviour was fine during the write phase, however I faced iops throttling during the read phase(read taking forever to complete). I think this might be contributed by the heavy use of shuffle index file (I didn't perform any extensive research on this), so I believe the shuffle manager logic have to be intelligent enough to reduce the fetching of files from object store. In the end for my usecase I started using pvcs and pvc aware scheduling along with decommissioning. So far performance is good with this choice.
Thank you On Tue, 9 Apr 2024, 15:17 Mich Talebzadeh, <mich.talebza...@gmail.com> wrote: > Hi, > > First thanks everyone for their contributions > > I was going to reply to @Enrico Minack <i...@enrico.minack.dev> but > noticed additional info. As I understand for example, Apache Uniffle is an > incubating project aimed at providing a pluggable shuffle service for > Spark. So basically, all these "external shuffle services" have in common > is to offload shuffle data management to external services, thus reducing > the memory and CPU overhead on Spark executors. That is great. While > Uniffle and others enhance shuffle performance and scalability, it would be > great to integrate them with Spark UI. This may require additional > development efforts. I suppose the interest would be to have these > external matrices incorporated into Spark with one look and feel. This may > require customizing the UI to fetch and display metrics or statistics from > the external shuffle services. Has any project done this? > > Thanks > > Mich Talebzadeh, > Technologist | Solutions Architect | Data Engineer | Generative AI > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Mon, 8 Apr 2024 at 14:19, Vakaris Baškirov <vakaris.bashki...@gmail.com> > wrote: > >> I see that both Uniffle and Celebron support S3/HDFS backends which is >> great. >> In the case someone is using S3/HDFS, I wonder what would be the >> advantages of using Celebron or Uniffle vs IBM shuffle service plugin >> <https://github.com/IBM/spark-s3-shuffle> or Cloud Shuffle Storage >> Plugin from AWS >> <https://docs.aws.amazon.com/glue/latest/dg/cloud-shuffle-storage-plugin.html> >> ? >> >> These plugins do not require deploying a separate service. Are there any >> advantages to using Uniffle/Celebron in the case of using S3 backend, which >> would require deploying a separate service? >> >> Thanks >> Vakaris >> >> On Mon, Apr 8, 2024 at 10:03 AM roryqi <ror...@apache.org> wrote: >> >>> Apache Uniffle (incubating) may be another solution. >>> You can see >>> https://github.com/apache/incubator-uniffle >>> >>> https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era >>> >>> Mich Talebzadeh <mich.talebza...@gmail.com> 于2024年4月8日周一 07:15写道: >>> >>>> Splendid >>>> >>>> The configurations below can be used with k8s deployments of Spark. >>>> Spark applications running on k8s can utilize these configurations to >>>> seamlessly access data stored in Google Cloud Storage (GCS) and Amazon S3. >>>> >>>> For Google GCS we may have >>>> >>>> spark_config_gcs = { >>>> "spark.kubernetes.authenticate.driver.serviceAccountName": >>>> "service_account_name", >>>> "spark.hadoop.fs.gs.impl": >>>> "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem", >>>> "spark.hadoop.google.cloud.auth.service.account.enable": "true", >>>> "spark.hadoop.google.cloud.auth.service.account.json.keyfile": >>>> "/path/to/keyfile.json", >>>> } >>>> >>>> For Amazon S3 similar >>>> >>>> spark_config_s3 = { >>>> "spark.kubernetes.authenticate.driver.serviceAccountName": >>>> "service_account_name", >>>> "spark.hadoop.fs.s3a.impl": >>>> "org.apache.hadoop.fs.s3a.S3AFileSystem", >>>> "spark.hadoop.fs.s3a.access.key": "s3_access_key", >>>> "spark.hadoop.fs.s3a.secret.key": "secret_key", >>>> } >>>> >>>> >>>> To implement these configurations and enable Spark applications to >>>> interact with GCS and S3, I guess we can approach it this way >>>> >>>> 1) Spark Repository Integration: These configurations need to be added >>>> to the Spark repository as part of the supported configuration options for >>>> k8s deployments. >>>> >>>> 2) Configuration Settings: Users need to specify these configurations >>>> when submitting Spark applications to a Kubernetes cluster. They can >>>> include these configurations in the Spark application code or pass them as >>>> command-line arguments or environment variables during application >>>> submission. >>>> >>>> HTH >>>> >>>> Mich Talebzadeh, >>>> >>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>> London >>>> United Kingdom >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> >>>> *Disclaimer:* The information provided is correct to the best of my >>>> knowledge but of course cannot be guaranteed . It is essential to note >>>> that, as with any advice, quote "one test result is worth one-thousand >>>> expert opinions (Werner >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>> >>>> >>>> On Sun, 7 Apr 2024 at 13:31, Vakaris Baškirov < >>>> vakaris.bashki...@gmail.com> wrote: >>>> >>>>> There is an IBM shuffle service plugin that supports S3 >>>>> https://github.com/IBM/spark-s3-shuffle >>>>> >>>>> Though I would think a feature like this could be a part of the main >>>>> Spark repo. Trino already has out-of-box support for s3 exchange (shuffle) >>>>> and it's very useful. >>>>> >>>>> Vakaris >>>>> >>>>> On Sun, Apr 7, 2024 at 12:27 PM Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> >>>>>> Thanks for your suggestion that I take it as a workaround. Whilst >>>>>> this workaround can potentially address storage allocation issues, I was >>>>>> more interested in exploring solutions that offer a more seamless >>>>>> integration with large distributed file systems like HDFS, GCS, or S3. >>>>>> This >>>>>> would ensure better performance and scalability for handling larger >>>>>> datasets efficiently. >>>>>> >>>>>> >>>>>> Mich Talebzadeh, >>>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>>> London >>>>>> United Kingdom >>>>>> >>>>>> >>>>>> view my Linkedin profile >>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>> >>>>>> >>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> *Disclaimer:* The information provided is correct to the best of my >>>>>> knowledge but of course cannot be guaranteed . It is essential to note >>>>>> that, as with any advice, quote "one test result is worth one-thousand >>>>>> expert opinions (Werner >>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>>>> >>>>>> >>>>>> On Sat, 6 Apr 2024 at 21:28, Bjørn Jørgensen < >>>>>> bjornjorgen...@gmail.com> wrote: >>>>>> >>>>>>> You can make a PVC on K8S call it 300GB >>>>>>> >>>>>>> make a folder in yours dockerfile >>>>>>> WORKDIR /opt/spark/work-dir >>>>>>> RUN chmod g+w /opt/spark/work-dir >>>>>>> >>>>>>> start spark with adding this >>>>>>> >>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName", >>>>>>> "300gb") \ >>>>>>> >>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.path", >>>>>>> "/opt/spark/work-dir") \ >>>>>>> >>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.readOnly", >>>>>>> "False") \ >>>>>>> >>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.options.claimName", >>>>>>> "300gb") \ >>>>>>> >>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.path", >>>>>>> "/opt/spark/work-dir") \ >>>>>>> >>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.readOnly", >>>>>>> "False") \ >>>>>>> .config("spark.local.dir", "/opt/spark/work-dir") >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> lør. 6. apr. 2024 kl. 15:45 skrev Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com>: >>>>>>> >>>>>>>> I have seen some older references for shuffle service for k8s, >>>>>>>> although it is not clear they are talking about a generic shuffle >>>>>>>> service for k8s. >>>>>>>> >>>>>>>> Anyhow with the advent of genai and the need to allow for a larger >>>>>>>> volume of data, I was wondering if there has been any more work on >>>>>>>> this matter. Specifically larger and scalable file systems like >>>>>>>> HDFS, >>>>>>>> GCS , S3 etc, offer significantly larger storage capacity than local >>>>>>>> disks on individual worker nodes in a k8s cluster, thus allowing >>>>>>>> handling much larger datasets more efficiently. Also the degree of >>>>>>>> parallelism and fault tolerance with these files systems come into >>>>>>>> it. I will be interested in hearing more about any progress on this. >>>>>>>> >>>>>>>> Thanks >>>>>>>> . >>>>>>>> >>>>>>>> Mich Talebzadeh, >>>>>>>> >>>>>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>>>>> >>>>>>>> London >>>>>>>> United Kingdom >>>>>>>> >>>>>>>> >>>>>>>> view my Linkedin profile >>>>>>>> >>>>>>>> >>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Disclaimer: The information provided is correct to the best of my >>>>>>>> knowledge but of course cannot be guaranteed . It is essential to >>>>>>>> note >>>>>>>> that, as with any advice, quote "one test result is worth >>>>>>>> one-thousand >>>>>>>> expert opinions (Werner Von Braun)". >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Bjørn Jørgensen >>>>>>> Vestre Aspehaug 4, 6010 Ålesund >>>>>>> Norge >>>>>>> >>>>>>> +47 480 94 297 >>>>>>> >>>>>>