[hudi] branch asf-site updated: [HUDI-4583][DOCS] Optimal write configs for bulk insert (#6399)

sivabalan Tue, 16 Aug 2022 11:05:09 -0700

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 6dacc12699 [HUDI-4583][DOCS] Optimal write configs for bulk insert 
(#6399)
6dacc12699 is described below

commit 6dacc126995f18406d76019eae047270de43a44d
Author: Sagar Sumit <sagarsumi...@gmail.com>
AuthorDate: Tue Aug 16 23:34:35 2022 +0530

    [HUDI-4583][DOCS] Optimal write configs for bulk insert (#6399)
---
 website/docs/performance.md | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/website/docs/performance.md b/website/docs/performance.md
index e64b0e551f..5bb7f935a1 100644
--- a/website/docs/performance.md
+++ b/website/docs/performance.md
@@ -30,6 +30,35 @@ the conventional alternatives for achieving these tasks.
 
 ### Write Path
 
+#### Bulk Insert
+
+Write configurations in Hudi are optimized for incremental upserts by default. 
In fact, the default write operation type is UPSERT as well.
+For simple append-only use case to bulk load the data, following set of 
configurations are recommended for optimal writing:
+```
+-- Use “bulk-insert” write-operation instead of default “upsert”
+hoodie.datasource.write.operation = BULK_INSERT
+-- Disable populating meta columns and metadata, and enable virtual keys
+hoodie.populate.meta.fields = false
+hoodie.metadata.enable = false
+-- Enable snappy compression codec for lesser CPU cycles (but more storage 
overhead)
+hoodie.parquet.compression.codec = snappy
+```
+
+For ingesting via spark-sql
+```
+-- Use “bulk-insert” write-operation instead of default “upsert”
+hoodie.sql.insert.mode = non-strict,
+hoodie.sql.bulk.insert.enable = true,
+-- Disable populating meta columns and metadata, and enable virtual keys
+hoodie.populate.meta.fields = false
+hoodie.metadata.enable = false
+-- Enable snappy compression codec for lesser CPU cycles (but more storage 
overhead)
+hoodie.parquet.compression.codec = snappy
+```
+
+We recently benchmarked Hudi against TPC-DS workload. 
+Please check out [our 
blog](/blog/2022/06/29/Apache-Hudi-vs-Delta-Lake-transparent-tpc-ds-lakehouse-performance-benchmarks)
 for more details.
+
 #### Upserts
 
 Following shows the speed up obtained for NoSQL database ingestion, from 
incrementally upserting on a Hudi table on the copy-on-write storage,

[hudi] branch asf-site updated: [HUDI-4583][DOCS] Optimal write configs for bulk insert (#6399)

Reply via email to