[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8574: URL: https://github.com/apache/hudi/pull/8574#issuecomment-1525545607 ## CI report: * cf4e7358763e10aab951d16d7270f7592f7c62b0 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8574: URL: https://github.com/apache/hudi/pull/8574#issuecomment-1525535120 ## CI report: * cf4e7358763e10aab951d16d7270f7592f7c62b0 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8493: [HUDI-6098] Use bulk insert prepped for the initial write into MDT.

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8493: URL: https://github.com/apache/hudi/pull/8493#issuecomment-1525524709 ## CI report: * 6d9d24f2e0ab70b97fb912505f2d0da60dfea86f Azure:

[GitHub] [hudi] lokeshj1703 commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
lokeshj1703 commented on PR #8574: URL: https://github.com/apache/hudi/pull/8574#issuecomment-1525524372 I have added some changes in the UT. It is in progress. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
lokeshj1703 commented on code in PR #8574: URL: https://github.com/apache/hudi/pull/8574#discussion_r1179003066 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/ChainedTransformer.java: ## @@ -46,9 +110,40 @@ public List getTransformersNames() { @Override

[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
lokeshj1703 commented on code in PR #8574: URL: https://github.com/apache/hudi/pull/8574#discussion_r1179003066 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/ChainedTransformer.java: ## @@ -46,9 +110,40 @@ public List getTransformersNames() { @Override

[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
lokeshj1703 commented on code in PR #8574: URL: https://github.com/apache/hudi/pull/8574#discussion_r1178997841 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/ChainedTransformer.java: ## @@ -46,9 +110,40 @@ public List getTransformersNames() { @Override

[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
lokeshj1703 commented on code in PR #8574: URL: https://github.com/apache/hudi/pull/8574#discussion_r1178997565 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/Transformer.java: ## @@ -45,4 +46,8 @@ public interface Transformer { */

[GitHub] [hudi] lokeshj1703 commented on a diff in pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-04-27 Thread via GitHub
lokeshj1703 commented on code in PR #8574: URL: https://github.com/apache/hudi/pull/8574#discussion_r1178997392 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/transform/Transformer.java: ## @@ -45,4 +46,8 @@ public interface Transformer { */

[GitHub] [hudi] hudi-bot commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8478: URL: https://github.com/apache/hudi/pull/8478#issuecomment-1525407039 ## CI report: * 6605f4759c25ff79eb43928cdbc97b086a905534 Azure:

[jira] [Updated] (HUDI-93) Enforce semantics on HoodieRecordPayload to allow for a consistent instantiation of custom payloads via reflection

2023-04-27 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-93: Fix Version/s: 1.0.0 > Enforce semantics on HoodieRecordPayload to allow for a consistent > instantiation of

[jira] [Updated] (HUDI-309) General Redesign of Archived Timeline for efficient scan and management

2023-04-27 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-309: - Fix Version/s: 1.0.0 > General Redesign of Archived Timeline for efficient scan and management >

[GitHub] [hudi] danny0405 commented on pull request #7173: [HUDI-5189] Make HiveAvroSerializer compatible with hive3

2023-04-27 Thread via GitHub
danny0405 commented on PR #7173: URL: https://github.com/apache/hudi/pull/7173#issuecomment-1525375730 @xicm Thanks for the contribution, can we squash the commits into one, it is hard for code reviewing because of the merge cmd, let's use the `git rebase ` instead of `git merge`, the `git

[jira] [Updated] (HUDI-5996) We should verify the consistency of bucket num at job startup.

2023-04-27 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5996: -- Status: In Progress (was: Open) > We should verify the consistency of bucket num at job startup. >

[jira] [Updated] (HUDI-5996) We should verify the consistency of bucket num at job startup.

2023-04-27 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5996: -- Fix Version/s: 0.14.0 > We should verify the consistency of bucket num at job startup. >

[jira] [Updated] (HUDI-5996) We should verify the consistency of bucket num at job startup.

2023-04-27 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5996: -- Status: Patch Available (was: In Progress) > We should verify the consistency of bucket num at job

[jira] [Updated] (HUDI-6047) Clustering operation on consistent hashing resulting in duplicate data

2023-04-27 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6047: -- Status: Patch Available (was: In Progress) > Clustering operation on consistent hashing resulting in

[jira] [Updated] (HUDI-6047) Clustering operation on consistent hashing resulting in duplicate data

2023-04-27 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6047: -- Status: In Progress (was: Open) > Clustering operation on consistent hashing resulting in duplicate

[GitHub] [hudi] danny0405 commented on issue #7602: [SUPPORT] When does the Spark engine's bulk insert mode support bucket index

2023-04-27 Thread via GitHub
danny0405 commented on issue #7602: URL: https://github.com/apache/hudi/issues/7602#issuecomment-1525312103 Yeah, do you have the intreast for the contribution, I can help with the code reivew. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2023-04-27 Thread via GitHub
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1525289166 ## CI report: * 744a515f20bf5611d649a2a502662799929779a3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8587: URL: https://github.com/apache/hudi/pull/8587#issuecomment-1525275586 ## CI report: * 8db0bfcd2ce5aee94771f35ceb8c0eeb905d2003 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7355: [HUDI-5308] Hive query returns null when the where clause has a partition field

2023-04-27 Thread via GitHub
hudi-bot commented on PR #7355: URL: https://github.com/apache/hudi/pull/7355#issuecomment-1525270232 ## CI report: * 744a515f20bf5611d649a2a502662799929779a3 Azure:

[GitHub] [hudi] ad1happy2go commented on issue #6591: [SUPPORT]Duplicate records in MOR

2023-04-27 Thread via GitHub
ad1happy2go commented on issue #6591: URL: https://github.com/apache/hudi/issues/6591#issuecomment-1525265398 JIRA created to fix the issue - https://issues.apache.org/jira/browse/HUDI-6146 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[jira] [Created] (HUDI-6146) Data Duplication in MOR tables when upsetting the same keys twice.

2023-04-27 Thread Aditya Goenka (Jira)
Aditya Goenka created HUDI-6146: --- Summary: Data Duplication in MOR tables when upsetting the same keys twice. Key: HUDI-6146 URL: https://issues.apache.org/jira/browse/HUDI-6146 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8587: URL: https://github.com/apache/hudi/pull/8587#issuecomment-1525256821 ## CI report: * 8db0bfcd2ce5aee94771f35ceb8c0eeb905d2003 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] ad1happy2go commented on issue #6591: [SUPPORT]Duplicate records in MOR

2023-04-27 Thread via GitHub
ad1happy2go commented on issue #6591: URL: https://github.com/apache/hudi/issues/6591#issuecomment-1525255395 Issue still exists in master. Reproducible script - ``` //action1: spark-dataframe write import org.apache.spark.sql.SaveMode._ import

[jira] [Closed] (HUDI-5517) HoodieTimeline support filter instants by state transition time

2023-04-27 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-5517. Fix Version/s: 0.13.1 0.14.0 Resolution: Fixed Fixed via master branch:

[hudi] branch master updated: [HUDI-5517] HoodieTimeline support filter instants by state transition time (#7627)

2023-04-27 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 77039ae734a [HUDI-5517] HoodieTimeline support

[GitHub] [hudi] danny0405 merged pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-27 Thread via GitHub
danny0405 merged PR #7627: URL: https://github.com/apache/hudi/pull/7627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] danny0405 commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-04-27 Thread via GitHub
danny0405 commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1525213139 The test has passed: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=16697=results -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] danny0405 commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

2023-04-27 Thread via GitHub
danny0405 commented on issue #8500: URL: https://github.com/apache/hudi/issues/8500#issuecomment-1525210337 Recently we plan to introduce completion time on the timeline: https://github.com/apache/hudi/pull/7627, after that, we can use this completion time to filter the timeline to

[GitHub] [hudi] danny0405 commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub
danny0405 commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1525159081 > @danny0405 integrated your patch. Now I need to: > > * confirm the test breaks without the change > * assert not throw npe Yeah, that makes sense, I run the 2 tests

[jira] [Updated] (HUDI-6145) Fix the flink table create schema to be compatible with Spark

2023-04-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6145: - Labels: pull-request-available (was: ) > Fix the flink table create schema to be compatible with

[GitHub] [hudi] danny0405 opened a new pull request, #8587: [HUDI-6145] Fix the flink table create schema to be compatible with S…

2023-04-27 Thread via GitHub
danny0405 opened a new pull request, #8587: URL: https://github.com/apache/hudi/pull/8587 …park ### Change Logs Fix the table create schema namespace and record name. ### Impact none ### Risk level (write none, low medium or high below) none

[jira] [Created] (HUDI-6145) Fix the flink table create schema to be compatible with Spark

2023-04-27 Thread Danny Chen (Jira)
Danny Chen created HUDI-6145: Summary: Fix the flink table create schema to be compatible with Spark Key: HUDI-6145 URL: https://issues.apache.org/jira/browse/HUDI-6145 Project: Apache Hudi

[GitHub] [hudi] parisni commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub
parisni commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1525111734 @danny0405 integrated your patch. Now I need to: - confirm the test breaks without the change - assert not throw npe -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] newsbreak-tonglin opened a new issue, #8586: [SUPPORT] Hudi MOR with Flink SQL, sync ro table success, but sync rt table failed

2023-04-27 Thread via GitHub
newsbreak-tonglin opened a new issue, #8586: URL: https://github.com/apache/hudi/issues/8586 use Flink Mongo CDC fetch data from mongo to Hudi MOR table, sync ro table success, but sync rt table failed with error message: 2023-04-27 07:46:27,967 INFO

[GitHub] [hudi] vinothchandar commented on a diff in pull request #6661: [HUDI-4853] Speeding up reading S3 files in S3EventsIncrSource

2023-04-27 Thread via GitHub
vinothchandar commented on code in PR #6661: URL: https://github.com/apache/hudi/pull/6661#discussion_r1178763050 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java: ## @@ -213,15 +216,27 @@ public Pair>, String>

[GitHub] [hudi] codope commented on issue #4863: [SUPPORT] Compaction and rollback with Flink cause data loss

2023-04-27 Thread via GitHub
codope commented on issue #4863: URL: https://github.com/apache/hudi/issues/4863#issuecomment-1525027596 Reopening to validate against master. Please close if fixed. cc @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] ghost opened a new issue, #4863: [SUPPORT] Compaction and rollback with Flink cause data loss

2023-04-27 Thread via GitHub
ghost opened a new issue, #4863: URL: https://github.com/apache/hudi/issues/4863 **Describe the problem you faced** * At instant time `20220221085407453`, Flink sent a compaction request to merge the delta log files into the base parquet files. ``` 2022-02-21 08:58:50,410 INFO

[GitHub] [hudi] stream2000 commented on issue #8500: [DISCUSS] Hive Sync will lose some partitions in multi writer scenario

2023-04-27 Thread via GitHub
stream2000 commented on issue #8500: URL: https://github.com/apache/hudi/issues/8500#issuecomment-1525023072 > since we have a tracking ticket, can we go ahead and close the github issue. Hi, I don't really understand how can I go ahead with this issue. Do you mean that I should

[GitHub] [hudi] Aload opened a new issue, #6102: [SUPPORT]Missing data problem,exigency!!!

2023-04-27 Thread via GitHub
Aload opened a new issue, #6102: URL: https://github.com/apache/hudi/issues/6102 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

[GitHub] [hudi] codope commented on issue #6102: [SUPPORT]Missing data problem,exigency!!!

2023-04-27 Thread via GitHub
codope commented on issue #6102: URL: https://github.com/apache/hudi/issues/6102#issuecomment-1525011926 Reopening to validate against master. cc @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] hudi-bot commented on pull request #8582: [HUDI-6142] Refactor the code related to creating user-defined index

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8582: URL: https://github.com/apache/hudi/pull/8582#issuecomment-1525007205 ## CI report: * f77423caf9e0bc624ffbf3b5848b143acb6b3357 Azure:

[GitHub] [hudi] codope commented on issue #7352: [SUPPORT]When writing to a hudi table synchronized with hive via Flink, the amount of data in the hudi table does not match the amount of data in the s

2023-04-27 Thread via GitHub
codope commented on issue #7352: URL: https://github.com/apache/hudi/issues/7352#issuecomment-1525004837 Reopening to validate against master. cc @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] lklhdu opened a new issue, #7352: [SUPPORT]When writing to a hudi table synchronized with hive via Flink, the amount of data in the hudi table does not match the amount of data in the

2023-04-27 Thread via GitHub
lklhdu opened a new issue, #7352: URL: https://github.com/apache/hudi/issues/7352 **Describe the problem you faced** Now I'm writing a hive sync hudi through Flink, when the sync is done, I find that there is a very big difference between the quantity obtained from the query and the

[GitHub] [hudi] ad1happy2go commented on issue #6869: [SUPPORT] Incremental upsert or merge is not working

2023-04-27 Thread via GitHub
ad1happy2go commented on issue #6869: URL: https://github.com/apache/hudi/issues/6869#issuecomment-1524975366 Not able to reproduce the issue. Code is working as expected. ```import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import

[GitHub] [hudi] codope commented on issue #7191: [SUPPORT] Missing Data with Amazon Athena in Glue Table with Hudi 0.10.1

2023-04-27 Thread via GitHub
codope commented on issue #7191: URL: https://github.com/apache/hudi/issues/7191#issuecomment-1524969555 > a. Few records from a table are unpartitioned and are stored in S3 with partition name "default". i.e. year/month/day --> default/default/default b. These records are not reflected

[GitHub] [hudi] alexone95 closed issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

2023-04-27 Thread via GitHub
alexone95 closed issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9 URL: https://github.com/apache/hudi/issues/8436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] alexone95 commented on issue #8436: [SUPPORT] run hoodie cleaner process as a spark submit request on EMR 6.9

2023-04-27 Thread via GitHub
alexone95 commented on issue #8436: URL: https://github.com/apache/hudi/issues/8436#issuecomment-1524958582 hi, i solved by working on a labda to delete file in archive rather than the .commit file thanks to the hoodie.keep.min.commits and hoodie.keep.max.commits thank -- This is an

[GitHub] [hudi] ChestnutQiang commented on issue #7602: [SUPPORT] When does the Spark engine's bulk insert mode support bucket index

2023-04-27 Thread via GitHub
ChestnutQiang commented on issue #7602: URL: https://github.com/apache/hudi/issues/7602#issuecomment-1524936763 @danny0405 I proposed a jira on this issue and gave me a little idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Updated] (HUDI-6144) [Spark][Flink]bucket index and then insert data in bulk, the correct file cannot be created

2023-04-27 Thread lizhiqiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lizhiqiang updated HUDI-6144: - Affects Version/s: 0.14.0 > [Spark][Flink]bucket index and then insert data in bulk, the correct file >

[jira] [Updated] (HUDI-6144) [Spark][Flink]bucket index and then insert data in bulk, the correct file cannot be created

2023-04-27 Thread lizhiqiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lizhiqiang updated HUDI-6144: - Component/s: flink-sql spark-sql > [Spark][Flink]bucket index and then insert data in

[GitHub] [hudi] rohan-uptycs commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-27 Thread via GitHub
rohan-uptycs commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1178710798 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[jira] [Updated] (HUDI-6144) [Spark][Flink]bucket index and then insert data in bulk, the correct file cannot be created

2023-04-27 Thread lizhiqiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lizhiqiang updated HUDI-6144: - Description: When I use bucket index and then insert data in bulk, the correct file cannot be created,

[jira] [Commented] (HUDI-6144) [Spark][Flink]bucket index and then insert data in bulk, the correct file cannot be created

2023-04-27 Thread lizhiqiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17717016#comment-17717016 ] lizhiqiang commented on HUDI-6144: -- [~danny0405]  > [Spark][Flink]bucket index and then insert data in

[GitHub] [hudi] codope commented on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

2023-04-27 Thread via GitHub
codope commented on issue #4618: URL: https://github.com/apache/hudi/issues/4618#issuecomment-1524893530 Reopening to validate against master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] ChangbingChen opened a new issue, #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

2023-04-27 Thread via GitHub
ChangbingChen opened a new issue, #4618: URL: https://github.com/apache/hudi/issues/4618 **Describe the problem you faced** When querying a hudi table in hive, there have duplicated records. This hudi table is created by flink. **To Reproduce** Steps to reproduce

[jira] [Created] (HUDI-6144) [Spark][Flink]bucket index and then insert data in bulk, the correct file cannot be created

2023-04-27 Thread lizhiqiang (Jira)
lizhiqiang created HUDI-6144: Summary: [Spark][Flink]bucket index and then insert data in bulk, the correct file cannot be created Key: HUDI-6144 URL: https://issues.apache.org/jira/browse/HUDI-6144

[GitHub] [hudi] hudi-bot commented on pull request #8523: [HUDI-6114] Fixed rollback of blocks in scanInternalV1

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8523: URL: https://github.com/apache/hudi/pull/8523#issuecomment-1524876144 ## CI report: * 8b2fbfcf5edcb39d36ccbc41bc2b30c2e8aa4212 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1524875964 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * b061571b7738370a1bdb64d2dd9cf5220b309d6d Azure:

[GitHub] [hudi] stream2000 closed issue #8498: [DISCUSS] [Flink] Should we support start a new instant when there is no data in the last batch to support multi writer?

2023-04-27 Thread via GitHub
stream2000 closed issue #8498: [DISCUSS] [Flink] Should we support start a new instant when there is no data in the last batch to support multi writer? URL: https://github.com/apache/hudi/issues/8498 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] hudi-bot commented on pull request #8523: [HUDI-6114] Fixed rollback of blocks in scanInternalV1

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8523: URL: https://github.com/apache/hudi/pull/8523#issuecomment-1524858950 ## CI report: * 8b2fbfcf5edcb39d36ccbc41bc2b30c2e8aa4212 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1524858691 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * b061571b7738370a1bdb64d2dd9cf5220b309d6d Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8585: [HUDI-8585]Improve documentation of org.apache.hudi.common.table.view.Abstr…

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8585: URL: https://github.com/apache/hudi/pull/8585#issuecomment-1524843651 ## CI report: * ef0b65c6471448ba86899c587618e60a6377d3c8 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8432: [HUDI-6072] Fix NPE when upsert merger and null map or array

2023-04-27 Thread via GitHub
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1524841754 ## CI report: * 3e9388ee9a6edaa6caab4f738b093f82744bc7dc Azure:

[GitHub] [hudi] prashantwason commented on a diff in pull request #8523: [HUDI-6114] Fixed rollback of blocks in scanInternalV1

2023-04-27 Thread via GitHub
prashantwason commented on code in PR #8523: URL: https://github.com/apache/hudi/pull/8523#discussion_r1178663188 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -2718,4 +2595,40 @@ private HoodieLogFormat.Reader

[GitHub] [hudi] uvplearn opened a new issue, #5869: [SUPPORT] There are duplicate values in HUDI MOR table for different partition and not updating values in same partition for GLOBAL_BLOOM

2023-04-27 Thread via GitHub
uvplearn opened a new issue, #5869: URL: https://github.com/apache/hudi/issues/5869 **Desciption** There are duplicate values in HUDI MOR table for different partition and not updating values in same partition for GLOBAL_BLOOM. **Steps To Reproduce this behavior** **STEP

[GitHub] [hudi] codope commented on issue #5869: [SUPPORT] There are duplicate values in HUDI MOR table for different partition and not updating values in same partition for GLOBAL_BLOOM

2023-04-27 Thread via GitHub
codope commented on issue #5869: URL: https://github.com/apache/hudi/issues/5869#issuecomment-1524786956 Reopening. Fix is in progress - https://github.com/apache/hudi/pull/8490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

<    1   2