[GitHub] [hudi] voonhous commented on a diff in pull request #7997: [HUDI-5822] Fix FileId not found exception when FileId is passed to HoodieMergeHa...

2023-02-21 Thread via GitHub
voonhous commented on code in PR #7997: URL: https://github.com/apache/hudi/pull/7997#discussion_r1113948876 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java: ## @@ -157,8 +157,8 @@ private void bootstrapIndexIfNeed(Str

[GitHub] [hudi] bvaradar commented on pull request #6726: [HUDI-4630] Add transformer capability to individual feeds in MultiTableDeltaStreamer

2023-02-21 Thread via GitHub
bvaradar commented on PR #6726: URL: https://github.com/apache/hudi/pull/6726#issuecomment-1439565478 @yesemsanthoshkumar : Can you rebase the PR for us to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [hudi] raghavant-git opened a new issue, #8016: Inline Clustering : Clustering failed to write to files

2023-02-21 Thread via GitHub
raghavant-git opened a new issue, #8016: URL: https://github.com/apache/hudi/issues/8016 Hello Team, We are using Hudi 0.12.0 via AWS EMR(Hive 3.1.3, Spark 3.3.0). Setup: Source data is from kafka and current hudi table has around 35 million rows partitioned by month and

[GitHub] [hudi] bvaradar commented on pull request #6456: [HUDI-4674]Change the default value of inputFormat for the MOR table

2023-02-21 Thread via GitHub
bvaradar commented on PR #6456: URL: https://github.com/apache/hudi/pull/6456#issuecomment-1439553483 @linfey90 : Thanks for the contribution. I too agree with @alexeykudinkin's comment that the defaults should not be changed. Will close this PR in a day to hear any more discussions on this

[GitHub] [hudi] hudi-bot commented on pull request #8015: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8015: URL: https://github.com/apache/hudi/pull/8015#issuecomment-1439542231 ## CI report: * f0dcc31b3644e4bc29e8fb375a09cbb1d2aa60dc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7978: [HUDI-5812] Optimize the data size check in HoodieBaseParquetWriter

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7978: URL: https://github.com/apache/hudi/pull/7978#issuecomment-1439542062 ## CI report: * d56932759e2bd261464dda11c3670720d0ff3faf Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1530

[GitHub] [hudi] bvaradar commented on pull request #6084: [HUDI-4383]Make hudi-flink-bundle module compile with the correct flink version

2023-02-21 Thread via GitHub
bvaradar commented on PR #6084: URL: https://github.com/apache/hudi/pull/6084#issuecomment-1439539177 @danny0405 : Is this PR still valid to be landed ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] stream2000 opened a new pull request, #8015: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-02-21 Thread via GitHub
stream2000 opened a new pull request, #8015: URL: https://github.com/apache/hudi/pull/8015 ### Change Logs fix https://github.com/apache/hudi/pull/7365, set SaveMode = Overwrite for insert overwrite non-partition table ### Impact insert overwrite table for non-partiti

[GitHub] [hudi] voonhous commented on a diff in pull request #7997: [HUDI-5822] Fix write and read correctness issue when a rollback is performed

2023-02-21 Thread via GitHub
voonhous commented on code in PR #7997: URL: https://github.com/apache/hudi/pull/7997#discussion_r1113914745 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java: ## @@ -148,7 +172,10 @@ public Option getLatestFileSlicesIncludingInflight() { */

[GitHub] [hudi] bvaradar commented on a diff in pull request #7997: [HUDI-5822] Fix write and read correctness issue when a rollback is performed

2023-02-21 Thread via GitHub
bvaradar commented on code in PR #7997: URL: https://github.com/apache/hudi/pull/7997#discussion_r1113911602 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java: ## @@ -122,7 +129,24 @@ public HoodieFileGroupId getFileGroupId() { * some log files,

[GitHub] [hudi] hudi-bot commented on pull request #7840: [MINOR] Close InflaterInputStream in finally when calling decompressBytes in BitCaskDiskMap

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7840: URL: https://github.com/apache/hudi/pull/7840#issuecomment-1439527996 ## CI report: * 5330d2b92b913cb02ebae8f638390f2d28448523 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1530

[GitHub] [hudi] boneanxs commented on a diff in pull request #7978: [HUDI-5812] Optimize the data size check in HoodieBaseParquetWriter

2023-02-21 Thread via GitHub
boneanxs commented on code in PR #7978: URL: https://github.com/apache/hudi/pull/7978#discussion_r1113770050 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBaseParquetWriter.java: ## @@ -56,23 +55,34 @@ public HoodieBaseParquetWriter(Path file, DEFAULT_WR

[GitHub] [hudi] codope commented on a diff in pull request #7901: [HUDI-5665] Adding support to re-use table configs

2023-02-21 Thread via GitHub
codope commented on code in PR #7901: URL: https://github.com/apache/hudi/pull/7901#discussion_r1113876340 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala: ## @@ -38,26 +38,23 @@ import scala.collection.JavaConverters._ */ obj

[GitHub] [hudi] hudi-bot commented on pull request #7840: [MINOR] Close InflaterInputStream in finally when calling decompressBytes in BitCaskDiskMap

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7840: URL: https://github.com/apache/hudi/pull/7840#issuecomment-1439496725 ## CI report: * 5330d2b92b913cb02ebae8f638390f2d28448523 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1530

[GitHub] [hudi] TengHuo commented on pull request #7840: [MINOR] Close InflaterInputStream in finally when calling decompressBytes in BitCaskDiskMap

2023-02-21 Thread via GitHub
TengHuo commented on PR #7840: URL: https://github.com/apache/hudi/pull/7840#issuecomment-1439490271 commit 5330d2 failed in IT, let me try it again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] hudi-bot commented on pull request #7886: [HUDI-5726]Fix timestamp field is 8 hours longer than the time

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7886: URL: https://github.com/apache/hudi/pull/7886#issuecomment-1439481263 ## CI report: * fe0ffd40e67f732677d1439092a1dedfba7ea7aa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1530

[GitHub] [hudi] sandyfog commented on pull request #7886: [HUDI-5726]Fix timestamp field is 8 hours longer than the time

2023-02-21 Thread via GitHub
sandyfog commented on PR #7886: URL: https://github.com/apache/hudi/pull/7886#issuecomment-1439470409 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [hudi] bvaradar commented on pull request #8005: [HUDI-5825] disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread via GitHub
bvaradar commented on PR #8005: URL: https://github.com/apache/hudi/pull/8005#issuecomment-1439416032 Looks reasonable to me. Will wait for a day to hear any comments before approving. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [hudi] bvaradar commented on pull request #8004: [MINOR][DOCS] Update SQL in procedures.md

2023-02-21 Thread via GitHub
bvaradar commented on PR #8004: URL: https://github.com/apache/hudi/pull/8004#issuecomment-1439410852 Will wait for procedures to be added before landing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [hudi] hudi-bot commented on pull request #8010: [HUDI-4442] [HUDI-5001] Sanitize JsonConversion and RowSource

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8010: URL: https://github.com/apache/hudi/pull/8010#issuecomment-1439404811 ## CI report: * 1cb788947969f4573d5a4b2171fbedee696b73d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1532

[GitHub] [hudi] littleeleventhwolf commented on issue #7892: [hudi-flink] flink sql query same table (join or union), it occurs source uid collision

2023-02-21 Thread via GitHub
littleeleventhwolf commented on issue #7892: URL: https://github.com/apache/hudi/issues/7892#issuecomment-1439402201 A trick method: use DDL twice in Flink SQL to define two different table names but the same `path` parameter. -- This is an automated message from the Apache Git Service. T

[GitHub] [hudi] joeytman commented on issue #7973: [SUPPORT] HoodieDeltaStreamer failing on bytesToAvro call when attempting to insert record

2023-02-21 Thread via GitHub
joeytman commented on issue #7973: URL: https://github.com/apache/hudi/issues/7973#issuecomment-1439400942 Hello again, I have an update. I was able to work around the issue and get DeltaStreamer working by defining a `DebeziumSchemaRegistryProvider` as follows: ``` package org.apache.

[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1439378585 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * b938bca41640daf3587e52d15ae48c911a9f5e76 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1439374587 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * b6026c9c914d1ecde0d46fd9d2e28841d6417e79 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8012: [MINOR] Update DOAP with 0.13.0 Release

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8012: URL: https://github.com/apache/hudi/pull/8012#issuecomment-1439371434 ## CI report: * 24b0dcbee9b74870a04fb5484a613215e888003b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1532

[GitHub] [hudi] boneanxs commented on a diff in pull request #7978: [HUDI-5812] Optimize the data size check in HoodieBaseParquetWriter

2023-02-21 Thread via GitHub
boneanxs commented on code in PR #7978: URL: https://github.com/apache/hudi/pull/7978#discussion_r1113770050 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBaseParquetWriter.java: ## @@ -56,23 +55,34 @@ public HoodieBaseParquetWriter(Path file, DEFAULT_WR

[GitHub] [hudi] boneanxs commented on a diff in pull request #7978: [HUDI-5812] Optimize the data size check in HoodieBaseParquetWriter

2023-02-21 Thread via GitHub
boneanxs commented on code in PR #7978: URL: https://github.com/apache/hudi/pull/7978#discussion_r1113770050 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBaseParquetWriter.java: ## @@ -56,23 +55,34 @@ public HoodieBaseParquetWriter(Path file, DEFAULT_WR

[jira] [Commented] (HUDI-5828) Support df.write.forma("hudi") with out any additional options

2023-02-21 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691887#comment-17691887 ] Sagar Sumit commented on HUDI-5828: --- Regarding point 5 (table name), can we infer table

[GitHub] [hudi] nfarah86 closed pull request #8013: added sample data set for the hudi cli blog

2023-02-21 Thread via GitHub
nfarah86 closed pull request #8013: added sample data set for the hudi cli blog URL: https://github.com/apache/hudi/pull/8013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] boneanxs commented on a diff in pull request #7978: [HUDI-5812] Optimize the data size check in HoodieBaseParquetWriter

2023-02-21 Thread via GitHub
boneanxs commented on code in PR #7978: URL: https://github.com/apache/hudi/pull/7978#discussion_r1113768769 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBaseParquetWriter.java: ## @@ -56,23 +55,34 @@ public HoodieBaseParquetWriter(Path file, DEFAULT_WR

[GitHub] [hudi] alexeykudinkin commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

2023-02-21 Thread via GitHub
alexeykudinkin commented on issue #7829: URL: https://github.com/apache/hudi/issues/7829#issuecomment-1439338137 @jtmzheng can you please paste your whole config you've been using to write to Hudi? -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [hudi] hudi-bot commented on pull request #8011: [HUDI-5808] Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8011: URL: https://github.com/apache/hudi/pull/8011#issuecomment-1439328213 ## CI report: * 0ad210a6500dfe7d8a070346e2bbda72805b4b78 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1532

[GitHub] [hudi] yihua commented on issue #8009: [SUPPORT Hive Sync error : Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool]

2023-02-21 Thread via GitHub
yihua commented on issue #8009: URL: https://github.com/apache/hudi/issues/8009#issuecomment-1439317857 Hi @aswin-mp thanks for raising this. Based on your description, the issue you encountered is similar to this one: #6281. The root cause is that, when `TimestampBasedKeyGenerator` is us

[GitHub] [hudi] lvyanquan commented on pull request #8004: [MINOR][DOCS] Update SQL in procedures.md

2023-02-21 Thread via GitHub
lvyanquan commented on PR #8004: URL: https://github.com/apache/hudi/pull/8004#issuecomment-1439304237 willing to do this and will try to complete it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[hudi] branch master updated (0f99315ab84 -> 9c8144045de)

2023-02-21 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 0f99315ab84 Handle empty payloads for AbstractDebeziumAvroPayload (#7944) add 9c8144045de [HUDI-5823] Claim RFC-65

[GitHub] [hudi] leesf merged pull request #8006: [HUDI-5823] Claim RFC-65 for Partition TTL Management

2023-02-21 Thread via GitHub
leesf merged PR #8006: URL: https://github.com/apache/hudi/pull/8006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #7901: [HUDI-5665] Adding support to re-use table configs

2023-02-21 Thread via GitHub
xushiyan commented on code in PR #7901: URL: https://github.com/apache/hudi/pull/7901#discussion_r1113711166 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -830,6 +830,33 @@ object DataSourceOptionsHelper { translate

[jira] [Updated] (HUDI-5831) Address flakiness of early conflict detection tests

2023-02-21 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5831: Description: Error from CI {code:java} 2023-02-18T04:19:53.4129671Z [ERROR] Tests run: 19, Failures: 1, Erro

[hudi] branch asf-site updated: [MINOR][DOCS] Fix community sync schedule image (#8014)

2023-02-21 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new d09f48643d4 [MINOR][DOCS] Fix community

[GitHub] [hudi] bhasudha merged pull request #8014: [MINOR][DOCS] Fix community sync schedule image

2023-02-21 Thread via GitHub
bhasudha merged PR #8014: URL: https://github.com/apache/hudi/pull/8014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.

[jira] [Updated] (HUDI-5831) Address flakiness of early conflict detection tests

2023-02-21 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5831: Fix Version/s: 0.13.1 > Address flakiness of early conflict detection tests > --

[jira] [Created] (HUDI-5831) Address flakiness of early conflict detection tests

2023-02-21 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5831: --- Summary: Address flakiness of early conflict detection tests Key: HUDI-5831 URL: https://issues.apache.org/jira/browse/HUDI-5831 Project: Apache Hudi Issue Type: Impro

[jira] [Assigned] (HUDI-5831) Address flakiness of early conflict detection tests

2023-02-21 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-5831: --- Assignee: Ethan Guo > Address flakiness of early conflict detection tests > -

[jira] [Updated] (HUDI-5831) Address flakiness of early conflict detection tests

2023-02-21 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5831: Story Points: 2 > Address flakiness of early conflict detection tests >

[GitHub] [hudi] hudi-bot commented on pull request #8010: [HUDI-4442] [HUDI-5001] Sanitize JsonConversion and RowSource

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8010: URL: https://github.com/apache/hudi/pull/8010#issuecomment-1439289440 ## CI report: * 803e8874071c540e672f93f5a481a05ae8ec6131 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] bhasudha commented on pull request #8014: [MINOR][DOCS] Fix community sync schedule image

2023-02-21 Thread via GitHub
bhasudha commented on PR #8014: URL: https://github.com/apache/hudi/pull/8014#issuecomment-1439283504 ![Screen Shot 2023-02-21 at 4 49 30 PM](https://user-images.githubusercontent.com/2179254/220492279-feeebaf7-149e-412a-8db1-1d30ab8ae046.png) -- This is an automated message from the

[GitHub] [hudi] bhasudha opened a new pull request, #8014: [MINOR][DOCS] Fix community sync schedule image

2023-02-21 Thread via GitHub
bhasudha opened a new pull request, #8014: URL: https://github.com/apache/hudi/pull/8014 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[GitHub] [hudi] nfarah86 opened a new pull request, #8013: added sample data set for the hudi cli blog

2023-02-21 Thread via GitHub
nfarah86 opened a new pull request, #8013: URL: https://github.com/apache/hudi/pull/8013 ### Change Logs no visible website changes- add a data set for the hudi blog cc @nsivabalan @bhasudha if you can peer-review -- This is an automated message from the Apache Git Service.

[GitHub] [hudi] hudi-bot commented on pull request #8010: [HUDI-4442] [HUDI-5001] Sanitize JsonConversion and RowSource

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8010: URL: https://github.com/apache/hudi/pull/8010#issuecomment-1439259269 ## CI report: * 803e8874071c540e672f93f5a481a05ae8ec6131 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] hudi-bot commented on pull request #7934: [HUDI-5777] Support Metrics for Multiple Tables Simultaneously

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7934: URL: https://github.com/apache/hudi/pull/7934#issuecomment-1439246889 ## CI report: * 0afb2b3c974a716c33c9d74f816eb027ccb056a2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1532

[GitHub] [hudi] hudi-bot commented on pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7885: URL: https://github.com/apache/hudi/pull/7885#issuecomment-1439246634 ## CI report: * 38b3bf82a57801e27cf28532590327b785754fc5 UNKNOWN * 6323b40d1d2d6ad9f54a21e0bc1f8d71249b96dc UNKNOWN * 546a221d5f28c7ee0a4e2b86f796efb6f3ae1f42 Azure: [FAILUR

[GitHub] [hudi] jtmzheng commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

2023-02-21 Thread via GitHub
jtmzheng commented on issue #7829: URL: https://github.com/apache/hudi/issues/7829#issuecomment-1439221138 Yep using `upsert` (Note: In https://apache-hudi.slack.com/archives/C4D716NPQ/p1675378366882569?thread_ts=1675301744.998269&cid=C4D716NPQ implied this would also be a problem with inse

[GitHub] [hudi] hudi-bot commented on pull request #8012: [MINOR] Update DOAP with 0.13.0 Release

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8012: URL: https://github.com/apache/hudi/pull/8012#issuecomment-1439219443 ## CI report: * 24b0dcbee9b74870a04fb5484a613215e888003b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1532

[jira] [Created] (HUDI-5830) Sanitize proto field names when converting to avro format or row format

2023-02-21 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-5830: - Summary: Sanitize proto field names when converting to avro format or row format Key: HUDI-5830 URL: https://issues.apache.org/jira/browse/HUDI-5830 Project: Apache

[jira] [Created] (HUDI-5829) Optimize conversion from json to row format when sanitizing field names

2023-02-21 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-5829: - Summary: Optimize conversion from json to row format when sanitizing field names Key: HUDI-5829 URL: https://issues.apache.org/jira/browse/HUDI-5829 Project: Apache

[GitHub] [hudi] hudi-bot commented on pull request #8012: [MINOR] Update DOAP with 0.13.0 Release

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8012: URL: https://github.com/apache/hudi/pull/8012#issuecomment-1439214093 ## CI report: * 24b0dcbee9b74870a04fb5484a613215e888003b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Comment Edited] (HUDI-5828) Support df.write.forma("hudi") with out any additional options

2023-02-21 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691819#comment-17691819 ] sivabalan narayanan edited comment on HUDI-5828 at 2/21/23 11:08 PM: ---

[jira] [Comment Edited] (HUDI-5828) Support df.write.forma("hudi") with out any additional options

2023-02-21 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691819#comment-17691819 ] sivabalan narayanan edited comment on HUDI-5828 at 2/21/23 11:07 PM: ---

[GitHub] [hudi] hudi-bot commented on pull request #8011: [HUDI-5808] Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8011: URL: https://github.com/apache/hudi/pull/8011#issuecomment-1439207688 ## CI report: * 0ad210a6500dfe7d8a070346e2bbda72805b4b78 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1532

[GitHub] [hudi] yihua opened a new pull request, #8012: [MINOR] Update DOAP with 0.13.0 Release

2023-02-21 Thread via GitHub
yihua opened a new pull request, #8012: URL: https://github.com/apache/hudi/pull/8012 ### Change Logs As above. ### Impact Adds 0.13.0 release. ### Risk level none ### Documentation Update N/A ### Contributor's checklist - [ ] Rea

[jira] [Commented] (HUDI-5828) Support df.write.forma("hudi") with out any additional options

2023-02-21 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691819#comment-17691819 ] sivabalan narayanan commented on HUDI-5828: --- As per our quick start guide,  we

[GitHub] [hudi] hudi-bot commented on pull request #8011: [HUDI-5808] Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8011: URL: https://github.com/apache/hudi/pull/8011#issuecomment-1439200526 ## CI report: * 0ad210a6500dfe7d8a070346e2bbda72805b4b78 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[hudi] annotated tag release-0.13.0 updated (a3f0615c857 -> b96144f3ea5)

2023-02-21 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to annotated tag release-0.13.0 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.13.0 was modified! *** from a3f0615c857 (commit) to b96144f3ea5 (tag) taggin

[GitHub] [hudi] alexeykudinkin commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

2023-02-21 Thread via GitHub
alexeykudinkin commented on issue #7829: URL: https://github.com/apache/hudi/issues/7829#issuecomment-1439197090 @jtmzheng what operation are you using? Is it "upsert"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[jira] [Updated] (HUDI-5824) COMBINE_BEFORE_UPSERT=false option does not work for upsert

2023-02-21 Thread kazdy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kazdy updated HUDI-5824: Status: Patch Available (was: In Progress) > COMBINE_BEFORE_UPSERT=false option does not work for upsert >

[jira] [Updated] (HUDI-5824) COMBINE_BEFORE_UPSERT=false option does not work for upsert

2023-02-21 Thread kazdy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kazdy updated HUDI-5824: Status: In Progress (was: Open) > COMBINE_BEFORE_UPSERT=false option does not work for upsert > ---

[jira] [Created] (HUDI-5828) Support df.write.forma("hudi") with out any additional options

2023-02-21 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5828: - Summary: Support df.write.forma("hudi") with out any additional options Key: HUDI-5828 URL: https://issues.apache.org/jira/browse/HUDI-5828 Project: Apache

[GitHub] [hudi] jtmzheng commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

2023-02-21 Thread via GitHub
jtmzheng commented on issue #7829: URL: https://github.com/apache/hudi/issues/7829#issuecomment-1439179802 We have a two stage pipeline: 1. Snapshot of MySQL table (as parquet files) 2. Convert to a Hudi table (ie. read in parquet, write out as Hudi table) # of rows: 15498207

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8011: [HUDI-5808] Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource

2023-02-21 Thread via GitHub
nsivabalan commented on code in PR #8011: URL: https://github.com/apache/hudi/pull/8011#discussion_r1113628425 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java: ## @@ -507,6 +508,13 @@ public static SchemaProviderWithPostProcessor wrapSchemaProviderWit

[GitHub] [hudi] hudi-bot commented on pull request #8005: [HUDI-5825] disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8005: URL: https://github.com/apache/hudi/pull/8005#issuecomment-1439167138 ## CI report: * 0639d0f836a0a5c1810fd140d6320cb725c37f01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] xushiyan commented on a diff in pull request #7951: [HUDI-5796] Adding auto inferring partition from incoming df

2023-02-21 Thread via GitHub
xushiyan commented on code in PR #7951: URL: https://github.com/apache/hudi/pull/7951#discussion_r1113598296 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -1018,6 +1023,26 @@ object HoodieSparkSqlWriter { } }

[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-02-21 Thread via GitHub
kazdy commented on code in PR #7998: URL: https://github.com/apache/hudi/pull/7998#discussion_r1113530490 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter { val recordT

[GitHub] [hudi] alexeykudinkin commented on issue #7829: [SUPPORT] Using monotonically_increasing_id to generate record key causing duplicates on upsert

2023-02-21 Thread via GitHub
alexeykudinkin commented on issue #7829: URL: https://github.com/apache/hudi/issues/7829#issuecomment-1439138215 @jtmzheng thanks for providing additional context! Can you please help me understand how did you determine duplicate rows in here: ``` # of rows: 154982072 # o

[jira] [Updated] (HUDI-5824) COMBINE_BEFORE_UPSERT=false option does not work for upsert

2023-02-21 Thread kazdy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kazdy updated HUDI-5824: Component/s: spark > COMBINE_BEFORE_UPSERT=false option does not work for upsert > -

[jira] [Updated] (HUDI-5825) Disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread kazdy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kazdy updated HUDI-5825: Status: In Progress (was: Open) > Disable Spark UI in tests if SPARK_EVLOG_DIR not set > --

[jira] [Updated] (HUDI-5825) Disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread kazdy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kazdy updated HUDI-5825: Status: Patch Available (was: In Progress) > Disable Spark UI in tests if SPARK_EVLOG_DIR not set > ---

[hudi] branch master updated (8ee354fcaf6 -> 0f99315ab84)

2023-02-21 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 8ee354fcaf6 [HUDI-5778] support absolute path names for hierarchical configs (#7920) add 0f99315ab84 Handle emp

[GitHub] [hudi] nsivabalan merged pull request #7944: [HUDI-5791] Handle empty payloads for AbstractDebeziumAvroPayload

2023-02-21 Thread via GitHub
nsivabalan merged PR #7944: URL: https://github.com/apache/hudi/pull/7944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] hudi-bot commented on pull request #7934: [HUDI-5777] Support Metrics for Multiple Tables Simultaneously

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7934: URL: https://github.com/apache/hudi/pull/7934#issuecomment-1439094201 ## CI report: * 2830ceca125d836eecfb129654d19a37aaf5fe3b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1525

[GitHub] [hudi] jonvex commented on a diff in pull request #8010: [HUDI-4442] [HUDI-5001] Sanitize JsonConversion and RowSource

2023-02-21 Thread via GitHub
jonvex commented on code in PR #8010: URL: https://github.com/apache/hudi/pull/8010#discussion_r1113557043 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -720,10 +721,22 @@ public static Schema getNullSchema() { * @return sanitized name */

[GitHub] [hudi] jonvex commented on pull request #7944: [HUDI-5791] Handle empty payloads for AbstractDebeziumAvroPayload

2023-02-21 Thread via GitHub
jonvex commented on PR #7944: URL: https://github.com/apache/hudi/pull/7944#issuecomment-1439065252 Reran the failing test and it succeeds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [hudi] jonvex closed pull request #7971: [HUDI-5808] Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource

2023-02-21 Thread via GitHub
jonvex closed pull request #7971: [HUDI-5808] Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource URL: https://github.com/apache/hudi/pull/7971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [hudi] jonvex opened a new pull request, #8011: [HUDI-5808] Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource

2023-02-21 Thread via GitHub
jonvex opened a new pull request, #8011: URL: https://github.com/apache/hudi/pull/8011 ### Change Logs Add Support for kaffka ofsets in jsonkafkasource and avrokafkasource new config hoodie.deltastreamer.source.kafka.append.offsets. Default is "false". Set to "true" with JsonKafkaS

[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-02-21 Thread via GitHub
kazdy commented on code in PR #7998: URL: https://github.com/apache/hudi/pull/7998#discussion_r1113530490 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter { val recordT

[GitHub] [hudi] hudi-bot commented on pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7885: URL: https://github.com/apache/hudi/pull/7885#issuecomment-1439041683 ## CI report: * 38b3bf82a57801e27cf28532590327b785754fc5 UNKNOWN * 6323b40d1d2d6ad9f54a21e0bc1f8d71249b96dc UNKNOWN * 79468852f0076b0cbcab2e0e17d248dac8fe4294 Azure: [FAILUR

[GitHub] [hudi] hudi-bot commented on pull request #7885: [HUDI-5352] Make sure FTs are run in GH CI

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7885: URL: https://github.com/apache/hudi/pull/7885#issuecomment-1439030882 ## CI report: * 38b3bf82a57801e27cf28532590327b785754fc5 UNKNOWN * 6323b40d1d2d6ad9f54a21e0bc1f8d71249b96dc UNKNOWN * 79468852f0076b0cbcab2e0e17d248dac8fe4294 Azure: [FAILUR

[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1439021158 ## CI report: * 13fafcd633926980f7e01117c8039138f14fa3f5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] hudi-bot commented on pull request #8010: [HUDI-4442] [HUDI-5001] Sanitize JsonConversion and RowSource

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8010: URL: https://github.com/apache/hudi/pull/8010#issuecomment-1438970301 ## CI report: * 803e8874071c540e672f93f5a481a05ae8ec6131 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] hudi-bot commented on pull request #8005: [HUDI-5825] disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8005: URL: https://github.com/apache/hudi/pull/8005#issuecomment-1438962589 ## CI report: * 0639d0f836a0a5c1810fd140d6320cb725c37f01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] kazdy commented on pull request #8005: [HUDI-5825] disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread via GitHub
kazdy commented on PR #8005: URL: https://github.com/apache/hudi/pull/8005#issuecomment-1438962434 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[hudi] branch master updated (33987b1bc27 -> 8ee354fcaf6)

2023-02-21 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 33987b1bc27 [HUDI-5792] fix CI test:TestDisruptorExecutionInSpark timeout problem (#8000) add 8ee354fcaf6 [HUDI

[GitHub] [hudi] nsivabalan merged pull request #7920: [HUDI-5778] support absolute path names for hierarchical configs

2023-02-21 Thread via GitHub
nsivabalan merged PR #7920: URL: https://github.com/apache/hudi/pull/7920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8010: [HUDI-4442] [HUDI-5001] Sanitize JsonConversion and RowSource

2023-02-21 Thread via GitHub
nsivabalan commented on code in PR #8010: URL: https://github.com/apache/hudi/pull/8010#discussion_r1113419832 ## hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java: ## @@ -720,10 +721,22 @@ public static Schema getNullSchema() { * @return sanitized name

[GitHub] [hudi] hudi-bot commented on pull request #8001: [HUDI-5817] Fix async indexer metadata writer to avoid eager rollback and failed write cleaning

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8001: URL: https://github.com/apache/hudi/pull/8001#issuecomment-1438907160 ## CI report: * 3b479ce83b6f3a7d5ca1654d26ab58d3e36b8ec5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1530

[jira] [Updated] (HUDI-5825) Disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5825: - Labels: pull-request-available (was: ) > Disable Spark UI in tests if SPARK_EVLOG_DIR not set > -

[GitHub] [hudi] hudi-bot commented on pull request #8005: [HUDI-5825] disable Spark UI in tests if SPARK_EVLOG_DIR not set

2023-02-21 Thread via GitHub
hudi-bot commented on PR #8005: URL: https://github.com/apache/hudi/pull/8005#issuecomment-1438889991 ## CI report: * 0639d0f836a0a5c1810fd140d6320cb725c37f01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

2023-02-21 Thread via GitHub
hudi-bot commented on PR #7998: URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438889874 ## CI report: * de68cc51637a324a4e711b74ad52092bb569fb52 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1531

[GitHub] [hudi] GroovyDan closed issue #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters

2023-02-21 Thread via GitHub
GroovyDan closed issue #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters URL: https://github.com/apache/hudi/issues/8007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [hudi] GroovyDan commented on issue #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters

2023-02-21 Thread via GitHub
GroovyDan commented on issue #8007: URL: https://github.com/apache/hudi/issues/8007#issuecomment-1438868177 I switched to Glue Version 4.0 and am no longer getting this error. I am leaving the case with AWS open and will be awaiting their response to see what the issue was with Glue Version

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7978: [HUDI-5812] Optimize the data size check in HoodieBaseParquetWriter

2023-02-21 Thread via GitHub
alexeykudinkin commented on code in PR #7978: URL: https://github.com/apache/hudi/pull/7978#discussion_r1113378703 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieBaseParquetWriter.java: ## @@ -56,23 +55,34 @@ public HoodieBaseParquetWriter(Path file, DEFA

  1   2   >