Hey vaquar, The link does't explain the crucial detail we're interested in - does executor re-use the data that exists on a node from previous executor and if not, how can we configure it to do so?
We are not running on kubernetes, so EKS/Kubernetes-specific advice isn't very relevant. We are running spark standalone mode. Best regards, maksym On 2023/05/17 12:28:35 vaquar khan wrote: > Following link you will get all required details > > https://aws.amazon.com/blogs/containers/best-practices-for-running-spark-on-amazon-eks/ > > Let me know if you required further informations. > > > Regards, > Vaquar khan > > > > > On Mon, May 15, 2023, 10:14 PM Mich Talebzadeh <mi...@gmail.com> > wrote: > > > Couple of points > > > > Why use spot or pre-empt intantes when your application as you stated > > shuffles heavily. > > Have you looked at why you are having these shuffles? What is the cause of > > these large transformations ending up in shuffle > > > > Also on your point: > > "..then ideally we should expect that when an executor is killed/OOM'd > > and a new executor is spawned on the same host, the new executor registers > > the shuffle files to itself. Is that so?" > > > > What guarantee is that the new executor with inherited shuffle files will > > succeed? > > > > Also OOM is often associated with some form of skewed data > > > > HTH > > . > > Mich Talebzadeh, > > Lead Solutions Architect/Engineering Lead > > Palantir Technologies Limited > > London > > United Kingdom > > > > > > view my Linkedin profile > > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > > loss, damage or destruction of data or any other property which may arise > > from relying on this email's technical content is explicitly disclaimed. > > The author will in no case be liable for any monetary damages arising from > > such loss, damage or destruction. > > > > > > > > > > On Mon, 15 May 2023 at 13:11, Faiz Halde <fa...@nubank.com.br.invalid> > > wrote: > > > >> Hello, > >> > >> We've been in touch with a few spark specialists who suggested us a > >> potential solution to improve the reliability of our jobs that are shuffle > >> heavy > >> > >> Here is what our setup looks like > >> > >> - Spark version: 3.3.1 > >> - Java version: 1.8 > >> - We do not use external shuffle service > >> - We use spot instances > >> > >> We run spark jobs on clusters that use Amazon EBS volumes. The > >> spark.local.dir is mounted on this EBS volume. One of the offerings from > >> the service we use is EBS migration which basically means if a host is > >> about to get evicted, a new host is created and the EBS volume is attached > >> to it > >> > >> When Spark assigns a new executor to the newly created instance, it > >> basically can recover all the shuffle files that are already persisted in > >> the migrated EBS volume > >> > >> Is this how it works? Do executors recover / re-register the shuffle > >> files that they found? > >> > >> So far I have not come across any recovery mechanism. I can only see > >> > >> KubernetesLocalDiskShuffleDataIO > >> > >> that has a pre-init step where it tries to register the available > >> shuffle files to itself > >> > >> A natural follow-up on this, > >> > >> If what they claim is true, then ideally we should expect that when an > >> executor is killed/OOM'd and a new executor is spawned on the same host, > >> the new executor registers the shuffle files to itself. Is that so? > >> > >> Thanks > >> > >> ------------------------------ > >> Confidentiality note: This e-mail may contain confidential information > >> from Nu Holdings Ltd and/or its affiliates. If you have received it by > >> mistake, please let us know by e-mail reply and delete it from your system; > >> you may not copy this message or disclose its contents to anyone; for > >> details about what personal information we collect and why, please refer to > >> our privacy policy > >> <https://api.mziq.com/mzfilemanager/v2/d/59a081d2-0d63-4bb5-b786-4c07ae26bc74/6f4939b9-5f74-a528-1835-596b481dca54> > >> . > >> > > > -- Confidentiality note: This e-mail may contain confidential information from Nu Holdings Ltd and/or its affiliates. If you have received it by mistake, please let us know by e-mail reply and delete it from your system; you may not copy this message or disclose its contents to anyone; for details about what personal information we collect and why, please refer to our privacy policy <https://api.mziq.com/mzfilemanager/v2/d/59a081d2-0d63-4bb5-b786-4c07ae26bc74/6f4939b9-5f74-a528-1835-596b481dca54>. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org