This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 0d997e5 [SPARK-31219][YARN] Enable closeIdleConnections in YarnShuffleService 0d997e5 is described below commit 0d997e5156a751c99cd6f8be1528ed088a585d1f Author: manuzhang <owenzhang1...@gmail.com> AuthorDate: Mon Mar 30 12:44:46 2020 -0500 [SPARK-31219][YARN] Enable closeIdleConnections in YarnShuffleService ### What changes were proposed in this pull request? Close idle connections at shuffle server side when an `IdleStateEvent` is triggered after `spark.shuffle.io.connectionTimeout` or `spark.network.timeout` time. It's based on following investigations. 1. We found connections on our clusters building up continuously (> 10k for some nodes). Is that normal ? We don't think so. 2. We looked into the connections on one node and found there were a lot of half-open connections. (connections only existed on one node) 3. We also checked those connections were very old (> 21 hours). (FYI, https://superuser.com/questions/565991/how-to-determine-the-socket-connection-up-time-on-linux) 4. Looking at the code, TransportContext registers an IdleStateHandler which should fire an IdleStateEvent when timeout. We did a heap dump of the YarnShuffleService and checked the attributes of IdleStateHandler. It turned out firstAllIdleEvent of many IdleStateHandlers were already false so IdleStateEvent were already fired. 5. Finally, we realized the IdleStateEvent would not be handled since closeIdleConnections are hardcoded to false for YarnShuffleService. ### Why are the changes needed? Idle connections to YarnShuffleService could never be closed, and will be accumulating and taking up memory and file descriptors. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #27998 from manuzhang/spark-31219. Authored-by: manuzhang <owenzhang1...@gmail.com> Signed-off-by: Thomas Graves <tgra...@apache.org> --- .../src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java index 815a56d..c41efba 100644 --- a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java +++ b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java @@ -188,7 +188,7 @@ public class YarnShuffleService extends AuxiliaryService { int port = conf.getInt( SPARK_SHUFFLE_SERVICE_PORT_KEY, DEFAULT_SPARK_SHUFFLE_SERVICE_PORT); - transportContext = new TransportContext(transportConf, blockHandler); + transportContext = new TransportContext(transportConf, blockHandler, true); shuffleServer = transportContext.createServer(port, bootstraps); // the port should normally be fixed, but for tests its useful to find an open port port = shuffleServer.getPort(); --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org