Re: Re: [spark-core] Can executors recover/reuse shuffle files upon failure?

2023-05-22 Thread Mich Talebzadeh
Just to correct the last sentence, if we end up starting a new instance of Spark, I don't think it will be able to read the shuffle data from storage from another instance, I stand corrected. Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United

Re: Re: [spark-core] Can executors recover/reuse shuffle files upon failure?

2023-05-22 Thread Mich Talebzadeh
Hi Maksym. Let us understand the basics here first My thoughtsSpark replicates the partitions among multiple nodes. If one executor fails, it moves the processing over to the other executor. However, if the data is lost, it re-executes the processing that generated the data, and might have to go

RE: Re: [spark-core] Can executors recover/reuse shuffle files upon failure?

2023-05-22 Thread Maksym M
Hey vaquar, The link does't explain the crucial detail we're interested in - does executor re-use the data that exists on a node from previous executor and if not, how can we configure it to do so? We are not running on kubernetes, so EKS/Kubernetes-specific advice isn't very relevant. We are

Re: [spark-core] Can executors recover/reuse shuffle files upon failure?

2023-05-17 Thread vaquar khan
Following link you will get all required details https://aws.amazon.com/blogs/containers/best-practices-for-running-spark-on-amazon-eks/ Let me know if you required further informations. Regards, Vaquar khan On Mon, May 15, 2023, 10:14 PM Mich Talebzadeh wrote: > Couple of points > > Why

Re: [spark-core] Can executors recover/reuse shuffle files upon failure?

2023-05-15 Thread Mich Talebzadeh
Couple of points Why use spot or pre-empt intantes when your application as you stated shuffles heavily. Have you looked at why you are having these shuffles? What is the cause of these large transformations ending up in shuffle Also on your point: "..then ideally we should expect that when an

[spark-core] Can executors recover/reuse shuffle files upon failure?

2023-05-15 Thread Faiz Halde
Hello, We've been in touch with a few spark specialists who suggested us a potential solution to improve the reliability of our jobs that are shuffle heavy Here is what our setup looks like - Spark version: 3.3.1 - Java version: 1.8 - We do not use external shuffle service - We use