Repository: spark Updated Branches: refs/heads/master 3efdf3532 -> 15fff7903
[SPARK-24297][CORE] Fetch-to-disk by default for > 2gb Fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might as well use fetch-to-disk in that case. The message includes some metadata in addition to the block data itself (in particular UploadBlock has a lot of metadata), so we leave a little room. Author: Imran Rashid <iras...@cloudera.com> Closes #21474 from squito/SPARK-24297. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/15fff790 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/15fff790 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/15fff790 Branch: refs/heads/master Commit: 15fff79032f6d708d8570b5e83144f1f84519552 Parents: 3efdf35 Author: Imran Rashid <iras...@cloudera.com> Authored: Wed Jul 25 09:08:42 2018 +0800 Committer: jerryshao <ss...@hortonworks.com> Committed: Wed Jul 25 09:08:42 2018 +0800 ---------------------------------------------------------------------- .../scala/org/apache/spark/internal/config/package.scala | 6 +++++- docs/configuration.md | 10 ++++++---- 2 files changed, 11 insertions(+), 5 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/15fff790/core/src/main/scala/org/apache/spark/internal/config/package.scala ---------------------------------------------------------------------- diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index ba892bf..8fef2aa 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -432,7 +432,11 @@ package object config { "external shuffle service, this feature can only be worked when external shuffle" + "service is newer than Spark 2.2.") .bytesConf(ByteUnit.BYTE) - .createWithDefault(Long.MaxValue) + // fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might + // as well use fetch-to-disk in that case. The message includes some metadata in addition + // to the block data itself (in particular UploadBlock has a lot of metadata), so we leave + // extra room. + .createWithDefault(Int.MaxValue - 512) private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES = ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses") http://git-wip-us.apache.org/repos/asf/spark/blob/15fff790/docs/configuration.md ---------------------------------------------------------------------- diff --git a/docs/configuration.md b/docs/configuration.md index 0c7c447..60c0358 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -580,13 +580,15 @@ Apart from these, the following properties are also available, and may be useful </tr> <tr> <td><code>spark.maxRemoteBlockSizeFetchToMem</code></td> - <td>Long.MaxValue</td> + <td>Int.MaxValue - 512</td> <td> The remote block will be fetched to disk when size of the block is above this threshold in bytes. - This is to avoid a giant request takes too much memory. We can enable this config by setting - a specific value(e.g. 200m). Note this configuration will affect both shuffle fetch + This is to avoid a giant request that takes too much memory. By default, this is only enabled + for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are + available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much + memory on smaller blocks as well. Note this configuration will affect both shuffle fetch and block manager remote block fetch. For users who enabled external shuffle service, - this feature can only be worked when external shuffle service is newer than Spark 2.2. + this feature can only be used when external shuffle service is newer than Spark 2.2. </td> </tr> <tr> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org