This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new e326b3fbe1 Update the docs for Distributed File System (DFS) section 
on Hudi Delta Streamer page (#5647)
e326b3fbe1 is described below

commit e326b3fbe10c7512544f2a90835cfd8a2e3b8dcd
Author: Léo Biscassi <leo.bisca...@gmail.com>
AuthorDate: Thu Jun 9 20:20:05 2022 -0300

    Update the docs for Distributed File System (DFS) section on Hudi Delta 
Streamer page (#5647)
    
    
    Co-authored-by: Raymond Xu <2701446+xushi...@users.noreply.github.com>
---
 website/docs/hoodie_deltastreamer.md                          | 5 +++++
 website/versioned_docs/version-0.10.0/hoodie_deltastreamer.md | 5 +++++
 website/versioned_docs/version-0.10.1/hoodie_deltastreamer.md | 5 +++++
 website/versioned_docs/version-0.11.0/hoodie_deltastreamer.md | 5 +++++
 4 files changed, 20 insertions(+)

diff --git a/website/docs/hoodie_deltastreamer.md 
b/website/docs/hoodie_deltastreamer.md
index c98cf67ddd..531c412860 100644
--- a/website/docs/hoodie_deltastreamer.md
+++ b/website/docs/hoodie_deltastreamer.md
@@ -310,6 +310,11 @@ other formats and then write data as Hudi format.)
 - ORC
 - HUDI
 
+For DFS sources the following behaviors are expected:
+
+- For JSON DFS source, you always need to set a schema. If the target Hudi 
table follows the same schema as from the source file, you just need to set the 
source schema. If not, you need to set schemas for both source and target. 
+- `HoodieDeltaStreamer` reads the files under the source base path 
(`hoodie.deltastreamer.source.dfs.root`) directly, and it won't use the 
partition paths under this base path as fields of the dataset. Detailed 
examples can be found [here](https://github.com/apache/hudi/issues/5485).
+
 ### Kafka
 Hudi can read directly from Kafka clusters. See more details on 
HoodieDeltaStreamer to learn how to setup streaming 
 ingestion with exactly once semantics, checkpointing, and plugin 
transformations. The following formats are supported 
diff --git a/website/versioned_docs/version-0.10.0/hoodie_deltastreamer.md 
b/website/versioned_docs/version-0.10.0/hoodie_deltastreamer.md
index 4dd27dfdde..c1c95f635b 100644
--- a/website/versioned_docs/version-0.10.0/hoodie_deltastreamer.md
+++ b/website/versioned_docs/version-0.10.0/hoodie_deltastreamer.md
@@ -288,6 +288,11 @@ other formats and then write data as Hudi format.)
 - ORC
 - HUDI
 
+For DFS sources the following behaviors are expected:
+
+- For JSON DFS source, you always need to set a schema. If the target Hudi 
table follows the same schema as from the source file, you just need to set the 
source schema. If not, you need to set schemas for both source and target.
+- `HoodieDeltaStreamer` reads the files under the source base path 
(`hoodie.deltastreamer.source.dfs.root`) directly, and it won't use the 
partition paths under this base path as fields of the dataset. Detailed 
examples can be found [here](https://github.com/apache/hudi/issues/5485).
+
 ### Kafka
 Hudi can read directly from Kafka clusters. See more details on 
HoodieDeltaStreamer to learn how to setup streaming 
 ingestion with exactly once semantics, checkpointing, and plugin 
transformations. The following formats are supported 
diff --git a/website/versioned_docs/version-0.10.1/hoodie_deltastreamer.md 
b/website/versioned_docs/version-0.10.1/hoodie_deltastreamer.md
index 4dd27dfdde..c1c95f635b 100644
--- a/website/versioned_docs/version-0.10.1/hoodie_deltastreamer.md
+++ b/website/versioned_docs/version-0.10.1/hoodie_deltastreamer.md
@@ -288,6 +288,11 @@ other formats and then write data as Hudi format.)
 - ORC
 - HUDI
 
+For DFS sources the following behaviors are expected:
+
+- For JSON DFS source, you always need to set a schema. If the target Hudi 
table follows the same schema as from the source file, you just need to set the 
source schema. If not, you need to set schemas for both source and target.
+- `HoodieDeltaStreamer` reads the files under the source base path 
(`hoodie.deltastreamer.source.dfs.root`) directly, and it won't use the 
partition paths under this base path as fields of the dataset. Detailed 
examples can be found [here](https://github.com/apache/hudi/issues/5485).
+
 ### Kafka
 Hudi can read directly from Kafka clusters. See more details on 
HoodieDeltaStreamer to learn how to setup streaming 
 ingestion with exactly once semantics, checkpointing, and plugin 
transformations. The following formats are supported 
diff --git a/website/versioned_docs/version-0.11.0/hoodie_deltastreamer.md 
b/website/versioned_docs/version-0.11.0/hoodie_deltastreamer.md
index caf8ce2b54..3e19b62e96 100644
--- a/website/versioned_docs/version-0.11.0/hoodie_deltastreamer.md
+++ b/website/versioned_docs/version-0.11.0/hoodie_deltastreamer.md
@@ -306,6 +306,11 @@ other formats and then write data as Hudi format.)
 - ORC
 - HUDI
 
+For DFS sources the following behaviors are expected:
+
+- For JSON DFS source, you always need to set a schema. If the target Hudi 
table follows the same schema as from the source file, you just need to set the 
source schema. If not, you need to set schemas for both source and target.
+- `HoodieDeltaStreamer` reads the files under the source base path 
(`hoodie.deltastreamer.source.dfs.root`) directly, and it won't use the 
partition paths under this base path as fields of the dataset. Detailed 
examples can be found [here](https://github.com/apache/hudi/issues/5485).
+
 ### Kafka
 Hudi can read directly from Kafka clusters. See more details on 
HoodieDeltaStreamer to learn how to setup streaming 
 ingestion with exactly once semantics, checkpointing, and plugin 
transformations. The following formats are supported 

Reply via email to