[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2022-05-03 Thread GitBox
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1116948868 ## CI report: * f9b524a53651db3e83dc922c08762bbae4e84233 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8418

[jira] [Assigned] (HUDI-4001) "hoodie.datasource.write.operation" from table config should not be used as write operation

2022-05-03 Thread Jira
[ https://issues.apache.org/jira/browse/HUDI-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 董可伦 reassigned HUDI-4001: - Assignee: 董可伦 > "hoodie.datasource.write.operation" from table config should not be used as > write operation >

[GitHub] [hudi] BalaMahesh opened a new issue, #5494: [SUPPORT] Hudi 0.11.0 HoodieDeltaStreamer failing to start with error : java.lang.NoSuchFieldError: DROP_PARTITION_COLUMNS

2022-05-03 Thread GitBox
BalaMahesh opened a new issue, #5494: URL: https://github.com/apache/hudi/issues/5494 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subsc

[GitHub] [hudi] rahil-c commented on issue #5484: [SUPPORT] Hive Sync + AWS Data Catalog failling with Hudi 0.11.0

2022-05-03 Thread GitBox
rahil-c commented on issue #5484: URL: https://github.com/apache/hudi/issues/5484#issuecomment-1116919671 Hi @jasondavindev, just curious on your setup of using hudi 0.11 on AWS EMR? The most recently offered version of Hudi on EMR is`0.9.0` https://docs.aws.amazon.com/emr/latest/ReleaseGui

[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2022-05-03 Thread GitBox
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1116916248 ## CI report: * 223c320447bc9adc8fccaabb9c590bed159b375d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5574

[GitHub] [hudi] rahil-c commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-05-03 Thread GitBox
rahil-c commented on issue #5298: URL: https://github.com/apache/hudi/issues/5298#issuecomment-1116915923 @kasured If you have opened a case with AWS EMR support, we have a backport of the fix for hudi 0.9.0 we can provide you. Let us know so we can close this thread out for now. -- This

[GitHub] [hudi] hudi-bot commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive

2022-05-03 Thread GitBox
hudi-bot commented on PR #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1116914799 ## CI report: * 223c320447bc9adc8fccaabb9c590bed159b375d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5574

[GitHub] [hudi] ksoullpwk commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

2022-05-03 Thread GitBox
ksoullpwk commented on issue #5281: URL: https://github.com/apache/hudi/issues/5281#issuecomment-1116900488 My expected scope for this issue is only for the properties file. For the rest part for handling the data, I think it should be done by users. The issue is I didn't know about t

[GitHub] [hudi] leobiscassi closed issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

2022-05-03 Thread GitBox
leobiscassi closed issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3 URL: https://github.com/apache/hudi/issues/5485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] leobiscassi commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

2022-05-03 Thread GitBox
leobiscassi commented on issue #5485: URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116871276 @yihua nice, I'll work on this and submit a PR, thanks. 👍🏽 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [hudi] stackls opened a new issue, #5493: Hudi Batch job failures randomly every time for different tables

2022-05-03 Thread GitBox
stackls opened a new issue, #5493: URL: https://github.com/apache/hudi/issues/5493 While processing 200 tables sequentially using Hudi for delta records, each time randomly 3 to 4 tables are getting failed with any of below two errors. It's not same tables which are getting failed after ea

[GitHub] [hudi] vinothchandar commented on pull request #5366: [HUDI-1176] Upgrade hudi to log4j2

2022-05-03 Thread GitBox
vinothchandar commented on PR #5366: URL: https://github.com/apache/hudi/pull/5366#issuecomment-1116753207 @bschell is this still WIP ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [hudi] yihua commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

2022-05-03 Thread GitBox
yihua commented on issue #5485: URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116735679 Feel free to close the issue if all good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [hudi] yihua commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

2022-05-03 Thread GitBox
yihua commented on issue #5485: URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116735323 @leobiscassi no problem. I agree that the docs can be improved around the key generator and partition field. If you already have sth in mind, I encourage you to put up a PR on improving

[GitHub] [hudi] vinothchandar commented on pull request #5436: [RFC-51] [HUDI-3478] Change Data Capture RFC

2022-05-03 Thread GitBox
vinothchandar commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1116735024 @danny0405 @YannByron I see the major sticking point is - Option A) separate `.cdc` folder, that contains the CDC log (similar to redo logs in databases) Option B) do

[jira] [Created] (HUDI-4035) Improve point lookup in Metadata Table

2022-05-03 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-4035: --- Summary: Improve point lookup in Metadata Table Key: HUDI-4035 URL: https://issues.apache.org/jira/browse/HUDI-4035 Project: Apache Hudi Issue Type: Task R

[jira] [Created] (HUDI-4034) Improve log merging performance for Metadata Table

2022-05-03 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-4034: --- Summary: Improve log merging performance for Metadata Table Key: HUDI-4034 URL: https://issues.apache.org/jira/browse/HUDI-4034 Project: Apache Hudi Issue Type: Task

[jira] [Closed] (HUDI-1015) Audit all getAllPartitionPaths() calls and keep em out of fast path

2022-05-03 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-1015. --- Resolution: Duplicate > Audit all getAllPartitionPaths() calls and keep em out of fast path >

[jira] [Created] (HUDI-4033) Aggregated cols stats at partition level in col stats partition in MDT

2022-05-03 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-4033: --- Summary: Aggregated cols stats at partition level in col stats partition in MDT Key: HUDI-4033 URL: https://issues.apache.org/jira/browse/HUDI-4033 Project: Apache Hudi

[jira] [Created] (HUDI-4032) Remove double file-listing in SparkHoodieFileIndex

2022-05-03 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-4032: --- Summary: Remove double file-listing in SparkHoodieFileIndex Key: HUDI-4032 URL: https://issues.apache.org/jira/browse/HUDI-4032 Project: Apache Hudi Issue Type: Task

[GitHub] [hudi] vinothchandar commented on a diff in pull request #5436: [RFC-51] [HUDI-3478] Change Data Capture RFC

2022-05-03 Thread GitBox
vinothchandar commented on code in PR #5436: URL: https://github.com/apache/hudi/pull/5436#discussion_r864305872 ## rfc/rfc-51/rfc-51.md: ## @@ -0,0 +1,233 @@ + + +# RFC-50: Hudi CDC + +# Proposers + +- @Yann Byron + +# Approvers + +- @Raymond + +# Statue +JIRA: [https://issues

[GitHub] [hudi] vinothchandar commented on a diff in pull request #5436: [RFC-51] [HUDI-3478] Change Data Capture RFC

2022-05-03 Thread GitBox
vinothchandar commented on code in PR #5436: URL: https://github.com/apache/hudi/pull/5436#discussion_r864304303 ## rfc/rfc-51/rfc-51.md: ## @@ -0,0 +1,233 @@ + + +# RFC-50: Hudi CDC + +# Proposers + +- @Yann Byron + +# Approvers + +- @Raymond + +# Statue +JIRA: [https://issues

[GitHub] [hudi] vinothchandar commented on a diff in pull request #5436: [RFC-51] [HUDI-3478] Change Data Capture RFC

2022-05-03 Thread GitBox
vinothchandar commented on code in PR #5436: URL: https://github.com/apache/hudi/pull/5436#discussion_r864303561 ## rfc/rfc-51/rfc-51.md: ## @@ -0,0 +1,233 @@ + + +# RFC-50: Hudi CDC + +# Proposers + +- @Yann Byron + +# Approvers + +- @Raymond + +# Statue +JIRA: [https://issues

[GitHub] [hudi] leobiscassi commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

2022-05-03 Thread GitBox
leobiscassi commented on issue #5485: URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116656782 > That's correct. If you want Spark like read to include the partition field from the partition path, you may consider SqlSource or SQL transformer. When I use the `ParquetDFS

[GitHub] [hudi] yihua commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

2022-05-03 Thread GitBox
yihua commented on issue #5485: URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116479729 > what you are saying is that independent of the datatype / style of the partitions from source dataset they won't be considered as fields, since Hudi Delta Streamer just list all the par

[GitHub] [hudi] yihua commented on issue #5481: [SUPPORT] Slow Upsert When Reloading Data into Hudi Table

2022-05-03 Thread GitBox
yihua commented on issue #5481: URL: https://github.com/apache/hudi/issues/5481#issuecomment-1116452731 @MikeBuh Thanks for the clarification. What is the input size of your batch reload? The similar principle can be applied here for calculating the parallelism. To be conservative at fir

[GitHub] [hudi] ashah-lightbox opened a new issue, #5492: _hoodie_is_delete works differently on hudi spark datasource on docker compare to hudi on emr.

2022-05-03 Thread GitBox
ashah-lightbox opened a new issue, #5492: URL: https://github.com/apache/hudi/issues/5492 **Describe the problem you faced** I tried _hoodie_is_delete on pyspark emr notebook and it works as desired. Below is my attached example performed in EMR - https://gist.github.com/as

[jira] [Updated] (HUDI-4022) Add support to validate table's internal state with integ test infra

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4022: -- Sprint: 2022/05/02 > Add support to validate table's internal state with integ test infr

[jira] [Updated] (HUDI-4028) Add failur injection tests to integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4028: -- Sprint: 2022/05/02 > Add failur injection tests to integ test framework > --

[jira] [Updated] (HUDI-4027) add support to test non-core write operations (insert overwrite, delete partitions) to integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4027: -- Sprint: 2022/05/02 > add support to test non-core write operations (insert overwrite, de

[jira] [Updated] (HUDI-4020) Add support to multi-writer tests to integ test framework (4 concurrent writers)

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4020: -- Sprint: 2022/05/02 > Add support to multi-writer tests to integ test framework (4 concur

[jira] [Updated] (HUDI-4017) Spark sql tests as part of github actions for diff spark versions

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4017: -- Sprint: 2022/05/02 > Spark sql tests as part of github actions for diff spark versions >

[jira] [Updated] (HUDI-4016) Prepare a document to list all tests to be done as part of release certification

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4016: -- Sprint: 2022/05/02 > Prepare a document to list all tests to be done as part of release

[jira] [Updated] (HUDI-4018) Prepare minimal set of yamls to be tested against any write mode and against any query engine

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4018: -- Sprint: 2022/05/02 > Prepare minimal set of yamls to be tested against any write mode an

[jira] [Updated] (HUDI-3957) Support spark2 and scala12 testing w/ integ test bundle

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3957: -- Sprint: 2022/05/02 > Support spark2 and scala12 testing w/ integ test bundle > -

[jira] [Updated] (HUDI-4019) Add ability to test async clustering w/ integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4019: -- Sprint: 2022/05/02 > Add ability to test async clustering w/ integ test framework >

[jira] [Closed] (HUDI-2464) Create comprehensive spark datasource yamls similar to deltastreamer

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-2464. - Resolution: Fixed > Create comprehensive spark datasource yamls similar to deltastreamer >

[jira] [Updated] (HUDI-1590) Support async clustering w/ test suite job

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1590: -- Sprint: 2022/05/02 > Support async clustering w/ test suite job > --

[jira] [Updated] (HUDI-3989) Prepare golden datasets for testing

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3989: -- Sprint: (was: 2022/05/02) > Prepare golden datasets for testing >

[jira] [Updated] (HUDI-3990) Integrate query engines read validation for each commit in e2e test

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3990: -- Sprint: (was: 2022/05/02) > Integrate query engines read validation for each commit in

[jira] [Updated] (HUDI-3668) Fix failing unit tests in hudi-integ-test

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3668: -- Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25 (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Ap

[jira] [Closed] (HUDI-2466) Add and validate comprehensive yamls for spark dml

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-2466. - Resolution: Fixed > Add and validate comprehensive yamls for spark dml >

[hudi] branch master updated: [MINOR] Update RFC status (#5486)

2022-05-03 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 3343cbb47b [MINOR] Update RFC status (#5486) 3343cb

[GitHub] [hudi] yihua merged pull request #5486: [MINOR] Update RFC status

2022-05-03 Thread GitBox
yihua merged PR #5486: URL: https://github.com/apache/hudi/pull/5486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

[jira] [Updated] (HUDI-4028) Add failur injection tests to integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4028: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Add failur injection tests to integ test framew

[jira] [Updated] (HUDI-4029) test out different lock providers using our integ test infra

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4029: -- Epic Link: HUDI-3303 (was: HUDI-4015) > test out different lock providers using our int

[jira] [Closed] (HUDI-4015) Integ test Infra

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4015. - Resolution: Duplicate > Integ test Infra > > > Key: HUDI-

[jira] [Updated] (HUDI-3991) Provide bundle jar options in each e2e test pipeline

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3991: -- Description: Make integ test bundle slim and run tests w/ actual bundles > Provide bundl

[GitHub] [hudi] liuzhuang2017 closed pull request #5491: [MINOR] Update the committer list is sorted by the first name

2022-05-03 Thread GitBox
liuzhuang2017 closed pull request #5491: [MINOR] Update the committer list is sorted by the first name URL: https://github.com/apache/hudi/pull/5491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[jira] [Updated] (HUDI-4027) add support to test non-core write operations (insert overwrite, delete partitions) to integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4027: -- Epic Link: HUDI-3303 (was: HUDI-4015) > add support to test non-core write operations (

[jira] [Closed] (HUDI-4030) add ability to test spark-sql with integ test infra

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4030. - Resolution: Duplicate > add ability to test spark-sql with integ test infra >

[GitHub] [hudi] liuzhuang2017 opened a new pull request, #5491: [MINOR] Update the committer list is sorted by the first name

2022-05-03 Thread GitBox
liuzhuang2017 opened a new pull request, #5491: URL: https://github.com/apache/hudi/pull/5491 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the p

[jira] [Updated] (HUDI-4026) Add support for spark streaming writes to integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4026: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Add support for spark streaming writes to integ

[jira] [Updated] (HUDI-4025) Add support to validate presto, trino and hive queries in integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4025: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Add support to validate presto, trino and hive

[jira] [Updated] (HUDI-4022) Add support to validate table's internal state with integ test infra

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4022: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Add support to validate table's internal state

[jira] [Closed] (HUDI-4024) Make integ test bundle slim and run tests w/ actual bundles

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4024. - Resolution: Fixed > Make integ test bundle slim and run tests w/ actual bundles >

[jira] [Updated] (HUDI-4019) Add ability to test async clustering w/ integ test framework

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4019: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Add ability to test async clustering w/ integ t

[jira] [Updated] (HUDI-4020) Add support to multi-writer tests to integ test framework (4 concurrent writers)

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4020: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Add support to multi-writer tests to integ test

[jira] [Updated] (HUDI-4018) Prepare minimal set of yamls to be tested against any write mode and against any query engine

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4018: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Prepare minimal set of yamls to be tested again

[jira] [Assigned] (HUDI-4018) Prepare minimal set of yamls to be tested against any write mode and against any query engine

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-4018: - Assignee: sivabalan narayanan > Prepare minimal set of yamls to be tested against

[jira] [Updated] (HUDI-4017) Spark sql tests as part of github actions for diff spark versions

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4017: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Spark sql tests as part of github actions for d

[jira] [Updated] (HUDI-4016) Prepare a document to list all tests to be done as part of release certification

2022-05-03 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4016: -- Epic Link: HUDI-3303 (was: HUDI-4015) > Prepare a document to list all tests to be done

[GitHub] [hudi] JavierLopezT opened a new issue, #5490: [SUPPORT] Unable to infer schema for JSON after reading Hudi files in Spark

2022-05-03 Thread GitBox
JavierLopezT opened a new issue, #5490: URL: https://github.com/apache/hudi/issues/5490 Hello. I am facing an issue, and I am not even sure that it's Hudi's fault, but I am totally lost. Sorry if it's not indeed due to Hudi. I have a code that reads a commit Hudi file (JSON), takes so

[jira] [Updated] (HUDI-64) Estimation of compression ratio & other dynamic storage knobs based on historical stats

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-64: --- Priority: Minor (was: Major) > Estimation of compression ratio & other dynamic storage knobs based on > histor

[jira] [Updated] (HUDI-2669) Upgrade Java toolset/runtime to JDK11

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2669: - Sprint: 2022/05/02 > Upgrade Java toolset/runtime to JDK11 > - > >

[jira] [Updated] (HUDI-2003) Auto Compute Compression ratio for input data to output parquet/orc file size

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2003: - Priority: Minor (was: Major) > Auto Compute Compression ratio for input data to output parquet/orc file s

[jira] [Updated] (HUDI-10) Auto tune bulk insert parallelism #555

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-10: --- Priority: Minor (was: Major) > Auto tune bulk insert parallelism #555 > --

[jira] [Updated] (HUDI-2669) Upgrade Java toolset/runtime to JDK11

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2669: - Fix Version/s: 0.12.0 > Upgrade Java toolset/runtime to JDK11 > - > >

[jira] [Updated] (HUDI-2669) Upgrade Java toolset/runtime to JDK11

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2669: - Component/s: performance (was: code-quality) Epic Link: HUDI-3249 Issue Typ

[jira] [Updated] (HUDI-1461) Bulk insert v2 creates additional small files

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1461: - Epic Link: (was: HUDI-3249) > Bulk insert v2 creates additional small files > --

[jira] [Updated] (HUDI-2928) Evaluate rebasing Hudi's default compression from Gzip to Zstd

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2928: - Sprint: Hudi-Sprint-Jan-10, 2022/05/02 (was: Hudi-Sprint-Jan-10) > Evaluate rebasing Hudi's default compr

[jira] [Assigned] (HUDI-2754) Performance improvement for IncrementalRelation

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-2754: Assignee: Jintao > Performance improvement for IncrementalRelation > --

[jira] [Updated] (HUDI-2754) Performance improvement for IncrementalRelation

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2754: - Sprint: Cont' improve - 2022/03/7, 2022/05/02 (was: Cont' improve - 2022/03/7) > Performance improvement

[jira] [Updated] (HUDI-2754) Performance improvement for IncrementalRelation

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2754: - Reviewers: Alexey Kudinkin > Performance improvement for IncrementalRelation > ---

[jira] [Updated] (HUDI-413) Use ColumnIndex in parquet to speed up scans

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-413: Sprint: 2022/04/25 > Use ColumnIndex in parquet to speed up scans > -

[jira] [Assigned] (HUDI-64) Estimation of compression ratio & other dynamic storage knobs based on historical stats

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-64: -- Assignee: (was: Forward Xu) > Estimation of compression ratio & other dynamic storage knobs based on

[jira] [Assigned] (HUDI-2003) Auto Compute Compression ratio for input data to output parquet/orc file size

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-2003: Assignee: (was: Raymond Xu) > Auto Compute Compression ratio for input data to output parquet/o

[jira] [Updated] (HUDI-2754) Performance improvement for IncrementalRelation

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2754: - Epic Link: HUDI-3249 > Performance improvement for IncrementalRelation > -

[jira] [Updated] (HUDI-1041) Cache the explodeRecordRDDWithFileComparisons

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1041: - Epic Link: HUDI-3249 > Cache the explodeRecordRDDWithFileComparisons > --

[jira] [Updated] (HUDI-411) Quantify the benefit of sizing files using benchmarks

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-411: Epic Link: HUDI-1238 > Quantify the benefit of sizing files using benchmarks > --

[jira] [Updated] (HUDI-413) Use ColumnIndex in parquet to speed up scans

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-413: Epic Link: HUDI-3249 > Use ColumnIndex in parquet to speed up scans > ---

[jira] [Updated] (HUDI-872) Implement JMH benchmarks for all core classes

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-872: Epic Link: HUDI-1238 > Implement JMH benchmarks for all core classes > -

[jira] [Closed] (HUDI-3741) Fix flink bucket index bulk insert generates too many small files

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3741. Assignee: Danny Chen Resolution: Fixed > Fix flink bucket index bulk insert generates too many small f

[jira] [Closed] (HUDI-3728) Set the sort operator parallelism for flink bucket bulk insert

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3728. Assignee: Danny Chen Resolution: Fixed > Set the sort operator parallelism for flink bucket bulk inser

[jira] [Updated] (HUDI-3918) Improve flink bulk_insert performace for partitioned table

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3918: - Epic Link: HUDI-3249 > Improve flink bulk_insert performace for partitioned table > --

[jira] [Closed] (HUDI-3808) Flink bulk_insert timestamp(3) can not be read by Spark

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3808. Assignee: Danny Chen Resolution: Fixed > Flink bulk_insert timestamp(3) can not be read by Spark > ---

[jira] [Updated] (HUDI-3918) Improve flink bulk_insert performace for partitioned table

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3918: - Component/s: performance > Improve flink bulk_insert performace for partitioned table > --

[jira] [Updated] (HUDI-1461) Bulk insert v2 creates additional small files

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1461: - Epic Link: HUDI-3249 > Bulk insert v2 creates additional small files > ---

[jira] [Updated] (HUDI-1461) Bulk insert v2 creates additional small files

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1461: - Component/s: performance > Bulk insert v2 creates additional small files > ---

[jira] [Updated] (HUDI-3993) Avoid calling into Spark UDF in Bulk Insert

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3993: - Fix Version/s: 0.12.0 > Avoid calling into Spark UDF in Bulk Insert >

[GitHub] [hudi] vinothchandar commented on pull request #5436: [RFC-51] [HUDI-3478] Change Data Capture RFC

2022-05-03 Thread GitBox
vinothchandar commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1116162592 @danny0405 catching up here. Lets keep the discussions on GH (I ll leave some comments on the doc as well) so everyone in the community can discover more easily? Ideally, love

[jira] [Updated] (HUDI-2928) Evaluate rebasing Hudi's default compression from Gzip to Zstd

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2928: - Component/s: performance storage-management Epic Link: HUDI-3249 > Evaluate rebasin

[jira] [Updated] (HUDI-2928) Evaluate rebasing Hudi's default compression from Gzip to Zstd

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2928: - Issue Type: Improvement (was: Task) > Evaluate rebasing Hudi's default compression from Gzip to Zstd > --

[jira] [Updated] (HUDI-2928) Evaluate rebasing Hudi's default compression from Gzip to Zstd

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2928: - Fix Version/s: 0.12.0 > Evaluate rebasing Hudi's default compression from Gzip to Zstd > -

[GitHub] [hudi] leobiscassi commented on issue #5485: [SUPPORT] Hudi Delta Streamer doesn't recognize hive style date partition on S3

2022-05-03 Thread GitBox
leobiscassi commented on issue #5485: URL: https://github.com/apache/hudi/issues/5485#issuecomment-1116157734 Hi @yihua, thanks for the answer! About the your points: (1) Thank you, I didn't notice this possibility, these folders are annoying 😓 (2) (3) > the parquet files you

[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1976: - Sprint: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25 (was: Hudi-Sprint-Apr-19, Hudi-Sprint-Apr-25, 2022/04/25)

[jira] [Updated] (HUDI-3883) File-sizing issues when writing COW table to S3

2022-05-03 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3883: - Epic Link: HUDI-3249 > File-sizing issues when writing COW table to S3 > -

[GitHub] [hudi] parisni opened a new issue, #5489: [SUPPORT] Feature Comment sync not working

2022-05-03 Thread GitBox
parisni opened a new issue, #5489: URL: https://github.com/apache/hudi/issues/5489 hudi 0.11.0 spark 3.2.1 / spark 2.4.x When adding comments to schema then hudi_sync don't add it to the hive table. Even when the feature is activate ``` + spark3.2-comments.py 08_pyspark

[GitHub] [hudi] nsivabalan commented on issue #5455: [SUPPORT] Read Hudi Table from Hive/Glue Catalog without specifying the S3 Path

2022-05-03 Thread GitBox
nsivabalan commented on issue #5455: URL: https://github.com/apache/hudi/issues/5455#issuecomment-1116086600 @bhasudha : Do we need to add any faq on this end? will let you take a call. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [hudi] parisni commented on issue #5482: [SUPPORT] metadata index fail with MOR tables

2022-05-03 Thread GitBox
parisni commented on issue #5482: URL: https://github.com/apache/hudi/issues/5482#issuecomment-1116057329 I cannot really share the whole code, but parts of it. > Also, do the timeouts prevent the ingestion from proceeding? yes : I only get 5 commit done but I am trying 6 operation

  1   2   >