[GitHub] [hudi] xushiyan opened a new pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2021-10-26 Thread GitBox
xushiyan opened a new pull request #3866: URL: https://github.com/apache/hudi/pull/3866 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpos

[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-10-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1430: - Labels: pull-request-available (was: ) > Implement SparkDataFrameWriteClient with SimpleIndex > -

[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290 ## CI report: * 5e0672ff41961296d581b06ffb26723a8456a423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[jira] [Updated] (HUDI-2570) flink pending Compaction error

2021-10-26 Thread mo.wu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mo.wu updated HUDI-2570: Affects Version/s: (was: 0.9.0) 0.10.0 > flink pending Compaction error > ---

[GitHub] [hudi] hudi-bot edited a comment on pull request #3778: [HUDI-2502] Refactor index in hudi-client module

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3778: URL: https://github.com/apache/hudi/pull/3778#issuecomment-939762523 ## CI report: * 67c6e4a2fe83469fd99a821be21636867bf2ecb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3864: [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3864: URL: https://github.com/apache/hudi/pull/3864#issuecomment-951541561 ## CI report: * 50f15722df9c3ab0bc36cf071bf4ca0fae1a7bff Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient

2021-10-26 Thread GitBox
hudi-bot commented on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-951642320 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290 ## CI report: * 5e0672ff41961296d581b06ffb26723a8456a423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2021-10-26 Thread GitBox
pratyakshsharma commented on a change in pull request #3646: URL: https://github.com/apache/hudi/pull/3646#discussion_r736235448 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java ## @@ -1240,6 +1244,154 @@ public void testKeepLate

[GitHub] [hudi] hudi-bot edited a comment on pull request #3803: [HUDI-2472] Enabling Metadata table for TestCleaner unit tests

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3803: URL: https://github.com/apache/hudi/pull/3803#issuecomment-943565382 ## CI report: * 707850631bafe59547d41945e827f2afe0171743 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3646: URL: https://github.com/apache/hudi/pull/3646#issuecomment-917638494 ## CI report: * f58beda3f7866eff3196f5ac7810dad5b7b9925b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3646: URL: https://github.com/apache/hudi/pull/3646#issuecomment-917638494 ## CI report: * f58beda3f7866eff3196f5ac7810dad5b7b9925b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[jira] [Commented] (HUDI-1549) Programmatic way to fetch earliest commit retained

2021-10-26 Thread Pratyaksh Sharma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434178#comment-17434178 ] Pratyaksh Sharma commented on HUDI-1549: so I guess the requirement is as simple a

[GitHub] [hudi] prashantwason commented on pull request #3836: [HUDI-2591] Bootstrap metadata table only if upgrade / downgrade is not required.

2021-10-26 Thread GitBox
prashantwason commented on pull request #3836: URL: https://github.com/apache/hudi/pull/3836#issuecomment-951668308 @nsivabalan Description updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] liujinhui1994 commented on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
liujinhui1994 commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-951678020 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [hudi] Cherry-Puppy commented on issue #3680: [SUPPORT]Failed to sync data to hive-3.1.2 by flink-sql

2021-10-26 Thread GitBox
Cherry-Puppy commented on issue #3680: URL: https://github.com/apache/hudi/issues/3680#issuecomment-951685884 > Did your program throw the same exception in this issue ? Did you also specify the bundle jar through `-j` option ? ERROR org.apache.hudi.sink.StreamWriteOperatorCoordinato

[jira] [Assigned] (HUDI-1475) Fix documentation of preCombine to clarify when this API is used by Hudi

2021-10-26 Thread Pratyaksh Sharma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyaksh Sharma reassigned HUDI-1475: -- Assignee: Pratyaksh Sharma > Fix documentation of preCombine to clarify when this API i

[GitHub] [hudi] pratyakshsharma opened a new pull request #3867: [HUDI-1475]: fixed java doc for precombine api

2021-10-26 Thread GitBox
pratyakshsharma opened a new pull request #3867: URL: https://github.com/apache/hudi/pull/3867 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the

[jira] [Updated] (HUDI-1475) Fix documentation of preCombine to clarify when this API is used by Hudi

2021-10-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1475: - Labels: pull-request-available user-support-issues (was: user-support-issues) > Fix documentatio

[GitHub] [hudi] yanghua commented on pull request #3864: [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread GitBox
yanghua commented on pull request #3864: URL: https://github.com/apache/hudi/pull/3864#issuecomment-951714733 cc @leesf @xushiyan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290 ## CI report: * 9207f2cf2f6216f84493834e35b02e1083f88923 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3646: URL: https://github.com/apache/hudi/pull/3646#issuecomment-917638494 ## CI report: * 3c7913637581198c9b18f91e9d1a43ad635f4fb7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] leesf commented on a change in pull request #3864: [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread GitBox
leesf commented on a change in pull request #3864: URL: https://github.com/apache/hudi/pull/3864#discussion_r736295091 ## File path: hudi-client/hudi-java-client/pom.xml ## @@ -121,6 +121,28 @@ junit-platform-commons test + + +

[GitHub] [hudi] leesf commented on a change in pull request #3864: [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread GitBox
leesf commented on a change in pull request #3864: URL: https://github.com/apache/hudi/pull/3864#discussion_r736295091 ## File path: hudi-client/hudi-java-client/pom.xml ## @@ -121,6 +121,28 @@ junit-platform-commons test + + +

[GitHub] [hudi] yanghua commented on a change in pull request #3864: [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread GitBox
yanghua commented on a change in pull request #3864: URL: https://github.com/apache/hudi/pull/3864#discussion_r736296664 ## File path: hudi-client/hudi-java-client/pom.xml ## @@ -121,6 +121,28 @@ junit-platform-commons test + + +

[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290 ## CI report: * 5e0672ff41961296d581b06ffb26723a8456a423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot commented on pull request #3867: [HUDI-1475]: fixed java doc for precombine api

2021-10-26 Thread GitBox
hudi-bot commented on pull request #3867: URL: https://github.com/apache/hudi/pull/3867#issuecomment-951718879 ## CI report: * 73ce430494e7b43b620eab62d4429d0760857caa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[GitHub] [hudi] hudi-bot edited a comment on pull request #3867: [HUDI-1475]: fixed java doc for precombine api

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3867: URL: https://github.com/apache/hudi/pull/3867#issuecomment-951718879 ## CI report: * 73ce430494e7b43b620eab62d4429d0760857caa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] minihippo commented on pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-10-26 Thread GitBox
minihippo commented on pull request #3173: URL: https://github.com/apache/hudi/pull/3173#issuecomment-951743238 Hi @vinothchandar @leesf , sorry for the long delay caused by my own personal reason. Recently I will focus more on the hudi community. Can we accelerate the patch merge into mas

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-26 Thread GitBox
xiarixiaoyao commented on a change in pull request #3203: URL: https://github.com/apache/hudi/pull/3203#discussion_r736332305 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java ## @@ -161,6 +162,46 @@ return rtSplit

[GitHub] [hudi] xiarixiaoyao commented on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
xiarixiaoyao commented on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-951753842 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [hudi] xiarixiaoyao removed a comment on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
xiarixiaoyao removed a comment on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-951753842 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] codope opened a new pull request #3869: [HUDI-1937] Rollback unfinished replace commit to allow updates

2021-10-26 Thread GitBox
codope opened a new pull request #3869: URL: https://github.com/apache/hudi/pull/3869 ## What is the purpose of the pull request If clustering fails due to some reason, it leaves the unfinished replace commit in the timeline. During ingestion, if there are updates to a filegroup tha

[jira] [Updated] (HUDI-1937) When clustering fail, generating unfinished replacecommit timeline.

2021-10-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1937: - Labels: pull-request-available (was: ) > When clustering fail, generating unfinished replacecommi

[GitHub] [hudi] codope commented on a change in pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-26 Thread GitBox
codope commented on a change in pull request #3802: URL: https://github.com/apache/hudi/pull/3802#discussion_r736347663 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1137,8 +1144,9 @@ public void testHoodieA

[GitHub] [hudi] hudi-bot edited a comment on pull request #3867: [HUDI-1475]: fixed java doc for precombine api

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3867: URL: https://github.com/apache/hudi/pull/3867#issuecomment-951718879 ## CI report: * 73ce430494e7b43b620eab62d4429d0760857caa Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] minihippo commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-10-26 Thread GitBox
minihippo commented on a change in pull request #3173: URL: https://github.com/apache/hudi/pull/3173#discussion_r736355271 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/SparkBucketIndex.java ## @@ -0,0 +1,183 @@ +/* + * Licensed to the

[GitHub] [hudi] codope commented on pull request #3830: [HUDI-2077] Set schema validation from main write config

2021-10-26 Thread GitBox
codope commented on pull request #3830: URL: https://github.com/apache/hudi/pull/3830#issuecomment-951772398 > trying to understand why do we need this change? metadata table is something internally managed. I prefer to enable schema validation always. I would prefer to validate the

[GitHub] [hudi] hudi-bot commented on pull request #3869: [HUDI-1937] Rollback unfinished replace commit to allow updates

2021-10-26 Thread GitBox
hudi-bot commented on pull request #3869: URL: https://github.com/apache/hudi/pull/3869#issuecomment-951773549 ## CI report: * 9eedf85ff36c50f8fb15ffdf012b9655e8796bda UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[GitHub] [hudi] hudi-bot edited a comment on pull request #3869: [HUDI-1937] Rollback unfinished replace commit to allow updates

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3869: URL: https://github.com/apache/hudi/pull/3869#issuecomment-951773549 ## CI report: * 9eedf85ff36c50f8fb15ffdf012b9655e8796bda Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] minihippo commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-10-26 Thread GitBox
minihippo commented on a change in pull request #3173: URL: https://github.com/apache/hudi/pull/3173#discussion_r736361071 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/BucketUtils.java ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Softw

[GitHub] [hudi] liujinhui1994 commented on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
liujinhui1994 commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-951786854 [ERROR] Tests run: 5, Failures: 4, Errors: 0, Skipped: 1, Time elapsed: 60.788 s <<< FAILURE! - in org.apache.hudi.cli.integ.ITTestRepairsCommand [ERROR] org.apache.hudi.c

[GitHub] [hudi] codope commented on issue #3853: Deltastreamer S3 EventsSource

2021-10-26 Thread GitBox
codope commented on issue #3853: URL: https://github.com/apache/hudi/issues/3853#issuecomment-951791051 That's strange! The aprk-submit command in the [blog](https://hudi.apache.org/blog/2021/08/23/s3-events-source#configuration-and-setup) did work on EMR. Can you search for the class

[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290 ## CI report: * 5e0672ff41961296d581b06ffb26723a8456a423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] codope commented on issue #3848: [SUPPORT] Cannot write to null outputStream error

2021-10-26 Thread GitBox
codope commented on issue #3848: URL: https://github.com/apache/hudi/issues/3848#issuecomment-951797914 I have tried with the latest master and cannot reproduce. This patch should fix it: #3364 Can you give it a shot? -- This is an automated message from the Apache Git Service. To re

[GitHub] [hudi] hudi-bot edited a comment on pull request #3869: [HUDI-1937] Rollback unfinished replace commit to allow updates

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3869: URL: https://github.com/apache/hudi/pull/3869#issuecomment-951773549 ## CI report: * 9eedf85ff36c50f8fb15ffdf012b9655e8796bda Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3820: [HUDI-2579] Make deltastreamer checkpoint state merging more explicit

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3820: URL: https://github.com/apache/hudi/pull/3820#issuecomment-945958915 ## CI report: * ba8f8b0059fd3dcd549edf2d675605711b74d266 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3824: [HUDI-1292] Millisecond granularity for instant timestamps

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3824: URL: https://github.com/apache/hudi/pull/3824#issuecomment-946872755 ## CI report: * f717a61506ee8f4bd4968a96ac9f097d516a84b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3824: [HUDI-1292] Millisecond granularity for instant timestamps

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3824: URL: https://github.com/apache/hudi/pull/3824#issuecomment-946872755 ## CI report: * f717a61506ee8f4bd4968a96ac9f097d516a84b1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3820: [HUDI-2579] Make deltastreamer checkpoint state merging more explicit

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3820: URL: https://github.com/apache/hudi/pull/3820#issuecomment-945958915 ## CI report: * ba8f8b0059fd3dcd549edf2d675605711b74d266 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] dongkelun commented on pull request #3700: [HUDI-2471] Add support ignoring case when column name matches in merge into

2021-10-26 Thread GitBox
dongkelun commented on pull request #3700: URL: https://github.com/apache/hudi/pull/3700#issuecomment-951844216 > > > @dongkelun I reproduce this using the table in UT in my local env, `ClassCastException` is raised. The detail trace stack: `Caused by: java.lang.ClassCastException: org.apa

[GitHub] [hudi] peanut-chenzhong commented on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
peanut-chenzhong commented on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-951852796 @xushiyan PR rebased and added UT for this scenario -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [hudi] peanut-chenzhong commented on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
peanut-chenzhong commented on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-951853549 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] fuyun2024 commented on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-26 Thread GitBox
fuyun2024 commented on pull request #3799: URL: https://github.com/apache/hudi/pull/3799#issuecomment-951867239 @codope @vinothchandar anything else should i do, to achive this commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] hudi-bot edited a comment on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-938264265 ## CI report: * Unknown: [CANCELED](TBD) * ee35aaa68f34312968b92cdbc966ac5e4a1e0ede UNKNOWN Bot commands @hudi-bot supports the following comm

[GitHub] [hudi] hudi-bot edited a comment on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-938264265 ## CI report: * ee35aaa68f34312968b92cdbc966ac5e4a1e0ede UNKNOWN * Unknown: [CANCELED](TBD) * 55482c784ddfffc72990cce9a82f05d941ff9437 UNKNOWN B

[GitHub] [hudi] hudi-bot edited a comment on pull request #3820: [HUDI-2579] Make deltastreamer checkpoint state merging more explicit

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3820: URL: https://github.com/apache/hudi/pull/3820#issuecomment-945958915 ## CI report: * 1d031a881c677dee5e40a0119be17e6a35969f60 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3823: [HUDI-2538] persist some configs to hoodie.properties when the first write

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3823: URL: https://github.com/apache/hudi/pull/3823#issuecomment-946850790 ## CI report: * 5ce2404609630fc4718cd3d021c895cb3aeb49b4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3824: [HUDI-1292] Millisecond granularity for instant timestamps

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3824: URL: https://github.com/apache/hudi/pull/3824#issuecomment-946872755 ## CI report: * 7bb77d2a5fb574a9ff1b178a06088e5e508e6c9c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-938264265 ## CI report: * ee35aaa68f34312968b92cdbc966ac5e4a1e0ede UNKNOWN * Unknown: [CANCELED](TBD) * 55482c784ddfffc72990cce9a82f05d941ff9437 Azure: [PENDING](ht

[GitHub] [hudi] hudi-bot edited a comment on pull request #3823: [HUDI-2538] persist some configs to hoodie.properties when the first write

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3823: URL: https://github.com/apache/hudi/pull/3823#issuecomment-946850790 ## CI report: * 5ce2404609630fc4718cd3d021c895cb3aeb49b4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] codope commented on a change in pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-26 Thread GitBox
codope commented on a change in pull request #3799: URL: https://github.com/apache/hudi/pull/3799#discussion_r736479436 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastream ## @@ -618,9 +618,8 @@ private void syncMeta(HoodieDeltaStreamerMetrics metri

[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3799: URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004 ## CI report: * c04a9ee52205199413d789bfa5dc262d6f3c8fe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3799: URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004 ## CI report: * c04a9ee52205199413d789bfa5dc262d6f3c8fe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] codope commented on issue #3755: [Delta Streamer] file name mismatch with meta when compaction running

2021-10-26 Thread GitBox
codope commented on issue #3755: URL: https://github.com/apache/hudi/issues/3755#issuecomment-951901867 > I have a question, when Hudi does delta commit, if data is new , it need append them to exist parquet file. meanwhile may cause concurrent issue with async compaction thread if compact

[GitHub] [hudi] fuyun2024 commented on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-26 Thread GitBox
fuyun2024 commented on pull request #3799: URL: https://github.com/apache/hudi/pull/3799#issuecomment-951903558 it was my first time modifying PR online in github, not familiar enough with the operation. I correct it a moment ago, please check it again. -- This is an automated message fr

[GitHub] [hudi] nsivabalan commented on a change in pull request #3746: [HUDI-2515] Add close when producing records failed

2021-10-26 Thread GitBox
nsivabalan commented on a change in pull request #3746: URL: https://github.com/apache/hudi/pull/3746#discussion_r736511594 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java ## @@ -49,8 +49,9 @@ public boolean hasNext() { t

[GitHub] [hudi] codope commented on issue #3755: [Delta Streamer] file name mismatch with meta when compaction running

2021-10-26 Thread GitBox
codope commented on issue #3755: URL: https://github.com/apache/hudi/issues/3755#issuecomment-951914894 > I found two sparkHoodieBloomIndex were running, is that means two writers ran parallelism? I believe those are part of the same writer process. Hudi performs index lookup to get

[GitHub] [hudi] codope commented on issue #3755: [Delta Streamer] file name mismatch with meta when compaction running

2021-10-26 Thread GitBox
codope commented on issue #3755: URL: https://github.com/apache/hudi/issues/3755#issuecomment-951918490 @fengjian428 How frequently are you facing this issue? You mentioned earlier that: > this table was create by Delta streamer's SqlSource from another table, but when ingest real-time

[GitHub] [hudi] hudi-bot edited a comment on pull request #3761: [HUDI-2509] OverwriteNonDefaultsWithLatestAvroPayload doesn`t work when upsert data with some null value column

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3761: URL: https://github.com/apache/hudi/pull/3761#issuecomment-938264265 ## CI report: * ee35aaa68f34312968b92cdbc966ac5e4a1e0ede UNKNOWN * 55482c784ddfffc72990cce9a82f05d941ff9437 Azure: [FAILURE](https://dev.azure.com/apache-hudi-

[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3799: URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004 ## CI report: * c04a9ee52205199413d789bfa5dc262d6f3c8fe5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3823: [HUDI-2538] persist some configs to hoodie.properties when the first write

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3823: URL: https://github.com/apache/hudi/pull/3823#issuecomment-946850790 ## CI report: * c491ffdeda525930fe01333daa7de0aa9c691df1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3700: URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979 ## CI report: * 98de9c0ec2e814c3c8c20276e6d1457c4eb7243d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3700: URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979 ## CI report: * 98de9c0ec2e814c3c8c20276e6d1457c4eb7243d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] dongkelun commented on a change in pull request #3746: [HUDI-2515] Add close when producing records failed

2021-10-26 Thread GitBox
dongkelun commented on a change in pull request #3746: URL: https://github.com/apache/hudi/pull/3746#discussion_r736550976 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java ## @@ -49,8 +49,9 @@ public boolean hasNext() { th

[GitHub] [hudi] dongkelun commented on a change in pull request #3746: [HUDI-2515] Add close when producing records failed

2021-10-26 Thread GitBox
dongkelun commented on a change in pull request #3746: URL: https://github.com/apache/hudi/pull/3746#discussion_r736550976 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java ## @@ -49,8 +49,9 @@ public boolean hasNext() { th

[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3799: URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004 ## CI report: * aa02b3508fee06bf0f3fd03b65d016eaeb9e4a65 UNKNOWN * e98d19ea99ead03b9360e04b1d006a67cf68a285 Azure: [FAILURE](https://dev.azure.com/apache-hudi-

[GitHub] [hudi] atharvai opened a new issue #3870: [SUPPORT] Hudi v0.8.0 Savepoint rollback failure

2021-10-26 Thread GitBox
atharvai opened a new issue #3870: URL: https://github.com/apache/hudi/issues/3870 **Describe the problem you faced** Given a savepoint, rollback fails with no error messages. **To Reproduce** Environment: EMR 6.4.0, S3 Steps to reproduce the behavior: 1. c

[GitHub] [hudi] fengjian428 commented on issue #3755: [Delta Streamer] file name mismatch with meta when compaction running

2021-10-26 Thread GitBox
fengjian428 commented on issue #3755: URL: https://github.com/apache/hudi/issues/3755#issuecomment-951963942 > streamer this should only happens when table has massive update. In my case, it's a 10 TB size table, millions records in every batch from kafka,and those records cause

[GitHub] [hudi] atharvai commented on issue #3870: [SUPPORT] Hudi v0.8.0 Savepoint rollback failure

2021-10-26 Thread GitBox
atharvai commented on issue #3870: URL: https://github.com/apache/hudi/issues/3870#issuecomment-951965505 ``` __ __ _ / / _ _ __ _ / /(_) __ / // __ `/| | / // __ `// // // __ \ / /_/ // /_/ / | |/ // /_/ // // //

[GitHub] [hudi] dongkelun commented on a change in pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-10-26 Thread GitBox
dongkelun commented on a change in pull request #3700: URL: https://github.com/apache/hudi/pull/3700#discussion_r736569882 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala ## @@ -163,15 +163,15 @@ case class

[GitHub] [hudi] nsivabalan commented on a change in pull request #3824: [HUDI-1292] Millisecond granularity for instant timestamps

2021-10-26 Thread GitBox
nsivabalan commented on a change in pull request #3824: URL: https://github.com/apache/hudi/pull/3824#discussion_r732399635 ## File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java ## @@ -423,8 +423,8 @@ public static String medianInstantTime(String highVal

[GitHub] [hudi] absognety commented on issue #3758: [SUPPORT] Issues when writing dataframe to hudi format with hive syncing enabled for AWS Athena and Glue metadata persistence

2021-10-26 Thread GitBox
absognety commented on issue #3758: URL: https://github.com/apache/hudi/issues/3758#issuecomment-951981381 Closing this, we stopped encountering this issue after regulating the parallelism and creating more buckets instead of having single bucket for our application, also added few cluster

[GitHub] [hudi] absognety closed issue #3758: [SUPPORT] Issues when writing dataframe to hudi format with hive syncing enabled for AWS Athena and Glue metadata persistence

2021-10-26 Thread GitBox
absognety closed issue #3758: URL: https://github.com/apache/hudi/issues/3758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@

[GitHub] [hudi] YannByron commented on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-10-26 Thread GitBox
YannByron commented on pull request #3700: URL: https://github.com/apache/hudi/pull/3700#issuecomment-951981643 @dongkelun Sorry, I personally think the previous solution to lowercase fields may not be the best and most correct one. We should dig into the root cause. -- This is an autom

[GitHub] [hudi] matthiasdg commented on issue #3868: [SUPPORT] Querying hudi datasets from standalone metastore

2021-10-26 Thread GitBox
matthiasdg commented on issue #3868: URL: https://github.com/apache/hudi/issues/3868#issuecomment-951981890 Meanwhile experimented with some other versions of hive metastore + mysql running in docker containers (e.g. 2.3.7 cf. spark). Same problems like the hive partition columns missing i

[GitHub] [hudi] matthiasdg edited a comment on issue #3868: [SUPPORT] Querying hudi datasets from standalone metastore

2021-10-26 Thread GitBox
matthiasdg edited a comment on issue #3868: URL: https://github.com/apache/hudi/issues/3868#issuecomment-951981890 Meanwhile experimented with some other versions of hive metastore + mysql running in docker containers (e.g. 2.3.7 cf. spark). Same problems like the hive partition columns mi

[GitHub] [hudi] hudi-bot edited a comment on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3700: URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979 ## CI report: * 421892e22d1e2cfe194eebd818cf707be70f8156 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3861: [WIP] Inspecting IT test failures

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3861: URL: https://github.com/apache/hudi/pull/3861#issuecomment-950966097 ## CI report: * 544c952d144d4b13ddf9f80cbb28e44b9c9c4a0c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] yanghua commented on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-26 Thread GitBox
yanghua commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-952003386 @liujinhui1994 Did you try to rebase your branch with the master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [hudi] dongkelun commented on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-10-26 Thread GitBox
dongkelun commented on pull request #3700: URL: https://github.com/apache/hudi/pull/3700#issuecomment-952005376 > @dongkelun Sorry, I personally think the previous solution to lowercase fields may not be the best and most correct one. We should dig into the root cause. I

[GitHub] [hudi] yanghua merged pull request #3864: [HUDI-2614] Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread GitBox
yanghua merged pull request #3864: URL: https://github.com/apache/hudi/pull/3864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr.

[hudi] branch master updated (e3fc746 -> b1c4acf)

2021-10-26 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from e3fc746 [HUDI-2625] Revert "[HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757)" (#3863) add b

[jira] [Updated] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-2614: --- Fix Version/s: 0.10.0 > Remove duplicated hadoop-hdfs with tests classifier exists in bundles > --

[GitHub] [hudi] hudi-bot edited a comment on pull request #3861: [WIP] Inspecting IT test failures

2021-10-26 Thread GitBox
hudi-bot edited a comment on pull request #3861: URL: https://github.com/apache/hudi/pull/3861#issuecomment-950966097 ## CI report: * 544c952d144d4b13ddf9f80cbb28e44b9c9c4a0c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[jira] [Closed] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-2614. -- Resolution: Done b1c4acf0aeb0f3d650c8e704828b1c2b0d2b5b40 > Remove duplicated hadoop-hdfs with tests classifier

[GitHub] [hudi] minihippo commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-10-26 Thread GitBox
minihippo commented on a change in pull request #3173: URL: https://github.com/apache/hudi/pull/3173#discussion_r736630174 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/utils/HiveHasher.java ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Softw

[GitHub] [hudi] leesf commented on a change in pull request #3823: [HUDI-2538] persist some configs to hoodie.properties when the first write

2021-10-26 Thread GitBox
leesf commented on a change in pull request #3823: URL: https://github.com/apache/hudi/pull/3823#discussion_r736655928 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java ## @@ -51,44 +49,70 @@ private

[GitHub] [hudi] leesf commented on a change in pull request #3823: [HUDI-2538] persist some configs to hoodie.properties when the first write

2021-10-26 Thread GitBox
leesf commented on a change in pull request #3823: URL: https://github.com/apache/hudi/pull/3823#discussion_r736658212 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java ## @@ -51,44 +49,70 @@ private

[GitHub] [hudi] leesf commented on a change in pull request #3823: [HUDI-2538] persist some configs to hoodie.properties when the first write

2021-10-26 Thread GitBox
leesf commented on a change in pull request #3823: URL: https://github.com/apache/hudi/pull/3823#discussion_r736658212 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java ## @@ -51,44 +49,70 @@ private

  1   2   3   >