This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new bdad1bf  [HUDI-766]: added section for HoodieMultiTableDeltaStreamer 
(#1822)
bdad1bf is described below

commit bdad1bf38190d8f21efde30e549c173b5b9bf115
Author: Pratyaksh Sharma <pratyaks...@gmail.com>
AuthorDate: Thu Aug 13 11:59:38 2020 +0530

    [HUDI-766]: added section for HoodieMultiTableDeltaStreamer (#1822)
    
    * [HUDI-766]: added section for HoodieMultiTableDeltaStreamer
    
    * [HUDI-766]: small changes
    
    * [HUDI-766]: addressed code review comments
---
 docs/_docs/2_2_writing_data.md | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/docs/_docs/2_2_writing_data.md b/docs/_docs/2_2_writing_data.md
index 6962563..43fc046 100644
--- a/docs/_docs/2_2_writing_data.md
+++ b/docs/_docs/2_2_writing_data.md
@@ -174,6 +174,42 @@ and then ingest it as follows.
 
 In some cases, you may want to migrate your existing table into Hudi 
beforehand. Please refer to [migration guide](/docs/migration_guide.html). 
 
+## MultiTableDeltaStreamer
+
+`HoodieMultiTableDeltaStreamer`, a wrapper on top of `HoodieDeltaStreamer`, 
enables one to ingest multiple tables at a single go into hudi datasets. 
Currently it only supports sequential processing of tables to be ingested and 
COPY_ON_WRITE storage type. The command line options for 
`HoodieMultiTableDeltaStreamer` are pretty much similar to 
`HoodieDeltaStreamer` with the only exception that you are required to provide 
table wise configs in separate files in a dedicated config folder. The [...]
+
+```java
+  * --config-folder
+    the path to the folder which contains all the table wise config files
+    --base-path-prefix
+    this is added to enable users to create all the hudi datasets for related 
tables under one path in FS. The datasets are then created under the path - 
<base_path_prefix>/<database>/<table_to_be_ingested>. However you can override 
the paths for every table by setting the property 
hoodie.deltastreamer.ingestion.targetBasePath
+```
+
+The following properties are needed to be set properly to ingest data using 
`HoodieMultiTableDeltaStreamer`. 
+
+```java
+hoodie.deltastreamer.ingestion.tablesToBeIngested
+  comma separated names of tables to be ingested in the format 
<database>.<table>, for example db1.table1,db1.table2
+hoodie.deltastreamer.ingestion.targetBasePath
+  if you wish to ingest a particular table in a separate path, you can mention 
that path here
+hoodie.deltastreamer.ingestion.<database>.<table>.configFile
+  path to the config file in dedicated config folder which contains table 
overridden properties for the particular table to be ingested.
+```
+
+Sample config files for table wise overridden properties can be found under 
`hudi-utilities/src/test/resources/delta-streamer-config`. The command to run 
`HoodieMultiTableDeltaStreamer` is also similar to how you run 
`HoodieDeltaStreamer`.
+
+```java
+[hoodie]$ spark-submit --class 
org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer `ls 
packaging/hudi-utilities-bundle/target/hudi-utilities-bundle-*.jar` \
+  --props 
file://${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties
 \
+  --config-folder file://tmp/hudi-ingestion-config \
+  --schemaprovider-class 
org.apache.hudi.utilities.schema.SchemaRegistryProvider \
+  --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
+  --source-ordering-field impresssiontime \
+  --base-path-prefix file:\/\/\/tmp/hudi-deltastreamer-op \ 
+  --target-table uber.impressions \
+  --op BULK_INSERT
+```
+
 ## Datasource Writer
 
 The `hudi-spark` module offers the DataSource API to write (and read) a Spark 
DataFrame into a Hudi table. There are a number of options available:

Reply via email to