Mridul Muralidharan created SPARK-6166: ------------------------------------------
Summary: Add config to limit number of concurrent outbound connections for shuffle fetch Key: SPARK-6166 URL: https://issues.apache.org/jira/browse/SPARK-6166 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.4.0 Reporter: Mridul Muralidharan spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of size. But this is not always sufficient : when the number of hosts in the cluster increase, this can lead to very large number of in-bound connections to one more nodes - causing workers to fail under the load. I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on number of outstanding outbound connections. This might still cause hotspots in the cluster, but in our tests this has significantly reduced the occurance of worker failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org