[hudi] branch asf-site updated: [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)

yihua Tue, 24 May 2022 17:30:17 -0700

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 9f2ea8563f [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and 
spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)
9f2ea8563f is described below

commit 9f2ea8563fe71ddcdfd9b10e40841948b5f0d586
Author: liuzhuang2017 <95120044+liuzhuang2...@users.noreply.github.com>
AuthorDate: Wed May 25 08:30:01 2022 +0800

    [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and 
spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)
---
 website/docs/tuning-guide.md                          | 6 +++---
 website/versioned_docs/version-0.10.1/tuning-guide.md | 6 +++---
 website/versioned_docs/version-0.11.0/tuning-guide.md | 6 +++---
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/website/docs/tuning-guide.md b/website/docs/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/docs/tuning-guide.md
+++ b/website/docs/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general 
rules of spark deb
 
 **Input Parallelism** : By default, Hudi tends to over-partition input (i.e 
`withParallelism(1500)`), to ensure each Spark partition stays within the 2GB 
limit for inputs upto 500GB. Bump this up accordingly if you have larger 
inputs. We recommend having shuffle parallelism 
`hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast 
input_data_size/500MB
 
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
`spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if 
you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
`spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are 
running into such failures.
 
 **Spark Memory** : Typically, hudi needs to be able to read a single file into 
memory to perform merges or compactions and thus the executor memory should be 
sufficient to accomodate this. In addition, Hoodie caches the input to be able 
to intelligently place data and thus leaving some 
`spark.memory.storageFraction` will generally help boost performance.
 
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
 spark.task.cpus 1
 spark.task.maxFailures 4
  
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
 spark.yarn.max.executor.failures 100
 ```
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.10.1/tuning-guide.md 
b/website/versioned_docs/version-0.10.1/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/versioned_docs/version-0.10.1/tuning-guide.md
+++ b/website/versioned_docs/version-0.10.1/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general 
rules of spark deb
 
 **Input Parallelism** : By default, Hudi tends to over-partition input (i.e 
`withParallelism(1500)`), to ensure each Spark partition stays within the 2GB 
limit for inputs upto 500GB. Bump this up accordingly if you have larger 
inputs. We recommend having shuffle parallelism 
`hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast 
input_data_size/500MB
 
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
`spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if 
you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
`spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are 
running into such failures.
 
 **Spark Memory** : Typically, hudi needs to be able to read a single file into 
memory to perform merges or compactions and thus the executor memory should be 
sufficient to accomodate this. In addition, Hoodie caches the input to be able 
to intelligently place data and thus leaving some 
`spark.memory.storageFraction` will generally help boost performance.
 
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
 spark.task.cpus 1
 spark.task.maxFailures 4
  
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
 spark.yarn.max.executor.failures 100
 ```
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.11.0/tuning-guide.md 
b/website/versioned_docs/version-0.11.0/tuning-guide.md
index 581778aa97..4affeafda6 100644
--- a/website/versioned_docs/version-0.11.0/tuning-guide.md
+++ b/website/versioned_docs/version-0.11.0/tuning-guide.md
@@ -13,7 +13,7 @@ Writing data via Hudi happens as a Spark job and thus general 
rules of spark deb
 
 **Input Parallelism** : By default, Hudi tends to over-partition input (i.e 
`withParallelism(1500)`), to ensure each Spark partition stays within the 2GB 
limit for inputs upto 500GB. Bump this up accordingly if you have larger 
inputs. We recommend having shuffle parallelism 
`hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast 
input_data_size/500MB
 
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
`spark.yarn.executor.memoryOverhead` or `spark.yarn.driver.memoryOverhead`, if 
you are running into such failures.
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
`spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are 
running into such failures.
 
 **Spark Memory** : Typically, hudi needs to be able to read a single file into 
memory to perform merges or compactions and thus the executor memory should be 
sufficient to accomodate this. In addition, Hoodie caches the input to be able 
to intelligently place data and thus leaving some 
`spark.memory.storageFraction` will generally help boost performance.
 
@@ -51,7 +51,7 @@ spark.submit.deployMode cluster
 spark.task.cpus 1
 spark.task.maxFailures 4
  
-spark.yarn.driver.memoryOverhead 1024
-spark.yarn.executor.memoryOverhead 3072
+spark.driver.memoryOverhead 1024
+spark.executor.memoryOverhead 3072
 spark.yarn.max.executor.failures 100
 ```
\ No newline at end of file

[hudi] branch asf-site updated: [MINOR][DOCS] Update spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead in the tuning-guide. (#5670)

Reply via email to