Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-19 Thread Nebi Aydin
Here's the executor logs ``` java.io.IOException: Connection from ip-172-31-16-143.ec2.internal/172.31.16.143:7337 closed at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146) at org.apache.spark.network.server.TransportCha

Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-19 Thread Mich Talebzadeh
That error message *FetchFailedException: Failed to connect to on port 7337 *happens when a task running on one executor node tries to fetch data from another executor node but fails to establish a connection to the specified port (7337 in this case). In a nutshell it is performing network IO amon

Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
Hi, sorry for duplicates. First time user :) I keep getting fetchfailedexception 7337 port closed. Which is external shuffle service port. I was trying to tune these parameters. I have around 1000 executors and 5000 cores. I tried to set spark.shuffle.io.serverThreads to 2k. Should I also set spark

Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Mich Talebzadeh
Hi, These two threads that you sent seem to be duplicates of each other? Anyhow I trust that you are familiar with the concept of shuffle in Spark. Spark Shuffle is an expensive operation since it involves the following - Disk I/O - Involves data serialization and deserialization

[Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
I want to learn differences among below thread configurations. spark.shuffle.io.serverThreads spark.shuffle.io.clientThreads spark.shuffle.io.threads spark.rpc.io.serverThreads spark.rpc.io.clientThreads spark.rpc.io.threads Thanks.

[Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
I want to learn differences among below thread configurations. spark.shuffle.io.serverThreads spark.shuffle.io.clientThreads spark.shuffle.io.threads spark.rpc.io.serverThreads spark.rpc.io.clientThreads spark.rpc.io.threads Thanks.