[GitHub] [hudi] boundarymate commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …

2023-03-23 Thread via GitHub
boundarymate commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482341654 Hello,does this PR solve the problem mentioned in https://www2.jianshu.com/p/385483e3d58f? -- This is an automated message from the Apache Git Service. To respond to the message, plea

[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized readers are used

2023-03-23 Thread voon (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon updated HUDI-5977: --- Description: When a Date -> String type conversion is performed and when the non-vectorized reader is used, the table

[GitHub] [hudi] voonhous commented on pull request #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …

2023-03-23 Thread via GitHub
voonhous commented on PR #8280: URL: https://github.com/apache/hudi/pull/8280#issuecomment-1482329274 @xiarixiaoyao Hello, can you please help to review the fixes here? A test case is included to reproduce this issue + verify the fix. Thank you. -- This is an automated messa

[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized readers are used

2023-03-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5977: - Labels: pull-request-available (was: ) > Fix Date to String casts when non-vectorized readers are

[GitHub] [hudi] voonhous opened a new pull request, #8280: [HUDI-5977] Fix Date to String column schema evolution causing table …

2023-03-23 Thread via GitHub
voonhous opened a new pull request, #8280: URL: https://github.com/apache/hudi/pull/8280 …to become unreadable when non-vectorized readers are used ### Change Logs Fix date -> string schema evolution when vectorized readers are not used. ### Impact Tables can be wh

[GitHub] [hudi] chenbodeng719 commented on issue #6367: [SUPPORT] Failed Job - doing partition and writing data - in Hudi 0.11.0

2023-03-23 Thread via GitHub
chenbodeng719 commented on issue #6367: URL: https://github.com/apache/hudi/issues/6367#issuecomment-1482326297 I have same problem.@Limess It seems that spark cant insert overwrite with bucket index if you have a large data. -- This is an automated message from the Apache Git Service. To

[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized readers are used

2023-03-23 Thread voon (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon updated HUDI-5977: --- Summary: Fix Date to String casts when non-vectorized readers are used (was: Fix Date to String casts when non-vector

[GitHub] [hudi] hudi-bot commented on pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8184: URL: https://github.com/apache/hudi/pull/8184#issuecomment-1482314099 ## CI report: * 7674fe343bb2209f6a969f4c77e2f784ee8efc29 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1482309351 ## CI report: * e68bfec947db2d785523def84d5e8e1cae9814ba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[GitHub] [hudi] hudi-bot commented on pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8184: URL: https://github.com/apache/hudi/pull/8184#issuecomment-1482309145 ## CI report: * 7674fe343bb2209f6a969f4c77e2f784ee8efc29 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized reader is used

2023-03-23 Thread voon (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon updated HUDI-5977: --- Description: When a Date -> String type conversion is performed and when the non-vectorized reader is used, the table

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1482304270 ## CI report: * e68bfec947db2d785523def84d5e8e1cae9814ba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[GitHub] [hudi] hudi-bot commented on pull request #8269: [HUDI-5967] Add partition ordering for full table scans

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8269: URL: https://github.com/apache/hudi/pull/8269#issuecomment-1482304218 ## CI report: * a6080a65d937c2f2ca30fa99d5859d01509b5a16 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1585

[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized reader is used

2023-03-23 Thread voon (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon updated HUDI-5977: --- Description: When a Date -> String type conversion is performed and when the non-vectorized reader is used, the table

[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-03-23 Thread via GitHub
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1482302935 ## CI report: * d5d4bb29aeae470e187606f02a9ea65546cc4ab5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[jira] [Updated] (HUDI-5977) Fix Date to String casts when non-vectorized reader is used

2023-03-23 Thread voon (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon updated HUDI-5977: --- Summary: Fix Date to String casts when non-vectorized reader is used (was: Fix Date -> String casts when non-vectoriz

[jira] [Assigned] (HUDI-5977) Fix Date to String casts when non-vectorized reader is used

2023-03-23 Thread voon (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] voon reassigned HUDI-5977: -- Assignee: voon > Fix Date to String casts when non-vectorized reader is used >

[jira] [Created] (HUDI-5977) Fix Date -> String casts when non-vectorized reader is used

2023-03-23 Thread voon (Jira)
voon created HUDI-5977: -- Summary: Fix Date -> String casts when non-vectorized reader is used Key: HUDI-5977 URL: https://issues.apache.org/jira/browse/HUDI-5977 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] jiangxinqi1995 commented on issue #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-23 Thread via GitHub
jiangxinqi1995 commented on issue #8276: URL: https://github.com/apache/hudi/issues/8276#issuecomment-1482297437 Every time a Flink task is restarted, every two checkpoints are performed, and the next checkpoint will fail. This phenomenon is very strange. -- This is an automated message f

[GitHub] [hudi] jiangxinqi1995 commented on issue #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-23 Thread via GitHub
jiangxinqi1995 commented on issue #8276: URL: https://github.com/apache/hudi/issues/8276#issuecomment-1482293819 I don't know why. At first, it is possible to synthesize Parquet files, but after synthesizing two Parquet files, subsequent files will not be merged, and there will be errors su

[GitHub] [hudi] jiangxinqi1995 commented on issue #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-23 Thread via GitHub
jiangxinqi1995 commented on issue #8276: URL: https://github.com/apache/hudi/issues/8276#issuecomment-1482283168 Partitions by day, only triggers the current day's partition each time `` -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [hudi] nfarah86 commented on a diff in pull request #8093: [HUDI-5886][DOCS] Improve File Sizing, Timeline, and Flink docs

2023-03-23 Thread via GitHub
nfarah86 commented on code in PR #8093: URL: https://github.com/apache/hudi/pull/8093#discussion_r1147130157 ## website/docs/timeline.md: ## @@ -3,40 +3,386 @@ title: Timeline toc: true --- -## Timeline -At its core, Hudi maintains a `timeline` of all actions performed on th

[GitHub] [hudi] nfarah86 commented on a diff in pull request #8093: [HUDI-5886][DOCS] Improve File Sizing, Timeline, and Flink docs

2023-03-23 Thread via GitHub
nfarah86 commented on code in PR #8093: URL: https://github.com/apache/hudi/pull/8093#discussion_r1147128617 ## website/docs/timeline.md: ## @@ -3,40 +3,386 @@ title: Timeline toc: true --- -## Timeline -At its core, Hudi maintains a `timeline` of all actions performed on th

[GitHub] [hudi] nfarah86 commented on a diff in pull request #8093: [HUDI-5886][DOCS] Improve File Sizing, Timeline, and Flink docs

2023-03-23 Thread via GitHub
nfarah86 commented on code in PR #8093: URL: https://github.com/apache/hudi/pull/8093#discussion_r1147123314 ## website/docs/flink_configuration.md: ## @@ -3,115 +3,177 @@ title: Flink Setup toc: true --- -## Global Configurations -When using Flink, you can set some global c

[GitHub] [hudi] nfarah86 commented on a diff in pull request #8093: [HUDI-5886][DOCS] Improve File Sizing, Timeline, and Flink docs

2023-03-23 Thread via GitHub
nfarah86 commented on code in PR #8093: URL: https://github.com/apache/hudi/pull/8093#discussion_r1147122606 ## website/docs/flink_configuration.md: ## @@ -3,115 +3,177 @@ title: Flink Setup toc: true --- -## Global Configurations -When using Flink, you can set some global c

[GitHub] [hudi] nfarah86 commented on a diff in pull request #8093: [HUDI-5886][DOCS] Improve File Sizing, Timeline, and Flink docs

2023-03-23 Thread via GitHub
nfarah86 commented on code in PR #8093: URL: https://github.com/apache/hudi/pull/8093#discussion_r1147122491 ## website/docs/flink_configuration.md: ## @@ -3,115 +3,177 @@ title: Flink Setup toc: true --- -## Global Configurations -When using Flink, you can set some global c

[GitHub] [hudi] danny0405 commented on issue #8087: [SUPPORT] split_reader don't checkpoint before consuming all splits

2023-03-23 Thread via GitHub
danny0405 commented on issue #8087: URL: https://github.com/apache/hudi/issues/8087#issuecomment-1482226450 > The first checkpoint barrier is behind 1200 splits. So you are talking about that the barrier is queued up behind of these input splits, in `#processElement`, we just put the

[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-03-23 Thread via GitHub
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1482224287 ## CI report: * 4ace11977a64bc8bee549351ac815a9cdb00aa33 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1521

[GitHub] [hudi] danny0405 commented on issue #8259: [SUPPORT] Clustering created files with modified schema resulting in corrupted table

2023-03-23 Thread via GitHub
danny0405 commented on issue #8259: URL: https://github.com/apache/hudi/issues/8259#issuecomment-1482220312 > I'm using in-process lock In-process lock can not work correctly for MDT with async table services. -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [hudi] hudi-bot commented on pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-03-23 Thread via GitHub
hudi-bot commented on PR #7362: URL: https://github.com/apache/hudi/pull/7362#issuecomment-1482219381 ## CI report: * 4ace11977a64bc8bee549351ac815a9cdb00aa33 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1521

[GitHub] [hudi] danny0405 commented on issue #8273: [SUPPORT] How to connect Hudi cli to MinIO

2023-03-23 Thread via GitHub
danny0405 commented on issue #8273: URL: https://github.com/apache/hudi/issues/8273#issuecomment-1482218848 Yeah, maybe the aws fellows can give some help here, @umehrot2 , do you have any idea how these AWS env variables can be handled over to HUDI CLI correctly? -- This is an automated

[GitHub] [hudi] danny0405 commented on issue #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-23 Thread via GitHub
danny0405 commented on issue #8276: URL: https://github.com/apache/hudi/issues/8276#issuecomment-1482217420 Your ckp options seem good, what is your partition path field then? How many partitions do you estimate that can be touched for one ckp write operation? -- This is an automated mess

[GitHub] [hudi] hudi-bot commented on pull request #8269: [HUDI-5967] Add partition ordering for full table scans

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8269: URL: https://github.com/apache/hudi/pull/8269#issuecomment-1482215101 ## CI report: * a6080a65d937c2f2ca30fa99d5859d01509b5a16 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1585

[GitHub] [hudi] danny0405 commented on issue #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that there is only log file without parquet?

2023-03-23 Thread via GitHub
danny0405 commented on issue #8279: URL: https://github.com/apache/hudi/issues/8279#issuecomment-1482213810 For `BULK_INSERT` operation with bucket index, the writer always generates parquets directly, can you share your job configurations? It seems the `BULK_INSERT` dees not really take ef

[GitHub] [hudi] danny0405 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?

2023-03-23 Thread via GitHub
danny0405 commented on issue #8267: URL: https://github.com/apache/hudi/issues/8267#issuecomment-1482211054 Yeah, the compaction lags a little bit, we can increase some resource for the compaction or use the offline compaction job. -- This is an automated message from the Apache Git Servi

[GitHub] [hudi] danny0405 commented on pull request #8269: [HUDI-5967] Add partition ordering for full table scans

2023-03-23 Thread via GitHub
danny0405 commented on PR #8269: URL: https://github.com/apache/hudi/pull/8269#issuecomment-1482209783 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [hudi] danny0405 commented on pull request #8269: [HUDI-5967] Add partition ordering for full table scans

2023-03-23 Thread via GitHub
danny0405 commented on PR #8269: URL: https://github.com/apache/hudi/pull/8269#issuecomment-1482209671 There is a timeout: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=15859&view=logs&j=3b6e910d-b98f-5de6-b9cb-1e5ff571f5de&t=30b5aae4-0ea0-5566-42d0-febf71a70

[GitHub] [hudi] weimingdiit commented on a diff in pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-03-23 Thread via GitHub
weimingdiit commented on code in PR #7362: URL: https://github.com/apache/hudi/pull/7362#discussion_r1147087855 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java: ## @@ -391,6 +392,11 @@ protected static boolean hasMetaFields(Schema schema) { retur

[GitHub] [hudi] duc-dn commented on issue #8273: [SUPPORT] How to connect Hudi cli to MinIO

2023-03-23 Thread via GitHub
duc-dn commented on issue #8273: URL: https://github.com/apache/hudi/issues/8273#issuecomment-1482204522 Hi @danny0405 - yes, I configured correctly MINIO env but don't connect. I find that hudi cli don't get this variable environments ``` export AWS_ENDPOINT=http://localhost:9000

[GitHub] [hudi] wolf8334 commented on issue #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

2023-03-23 Thread via GitHub
wolf8334 commented on issue #8268: URL: https://github.com/apache/hudi/issues/8268#issuecomment-1482198270 But the fact is,the first message I received,there is one key withe the key op and value d. So In my opinion,the first message tells hudi to delete one row,but the second makes no s

[GitHub] [hudi] jiangxinqi1995 commented on issue #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-23 Thread via GitHub
jiangxinqi1995 commented on issue #8276: URL: https://github.com/apache/hudi/issues/8276#issuecomment-1482196417 This is my Flink checkpoint configuration. I don't see any other useful log information ![image](https://user-images.githubusercontent.com/86709333/227415762-167a20c3-4d1b-49f

[GitHub] [hudi] danny0405 commented on issue #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-23 Thread via GitHub
danny0405 commented on issue #8276: URL: https://github.com/apache/hudi/issues/8276#issuecomment-1482178977 Did you see any error stack trace in the logging then, what time interval the ckp triggers with? -- This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [hudi] danny0405 commented on a diff in pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-23 Thread via GitHub
danny0405 commented on code in PR #8277: URL: https://github.com/apache/hudi/pull/8277#discussion_r1147066368 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java: ## @@ -168,8 +170,14 @@ protected ClosableIterator> deserializeRecords(b

[jira] [Closed] (HUDI-5972) Fix the flink 1.13 bundle jar version on website

2023-03-23 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-5972. Resolution: Fixed Fixed via asf-site: 51369b8ac9a8e984d3e5d740f9c9ae6ed11b84ee > Fix the flink 1.13 bundle

[hudi] branch asf-site updated: [HUDI-5972] Fix the flink 1.13 bundle jar version on website (#8266)

2023-03-23 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 51369b8ac9a [HUDI-5972] Fix the flink 1.13

[GitHub] [hudi] danny0405 merged pull request #8266: [HUDI-5972] Fix the flink 1.13 bundle jar version on website

2023-03-23 Thread via GitHub
danny0405 merged PR #8266: URL: https://github.com/apache/hudi/pull/8266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] danny0405 commented on issue #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

2023-03-23 Thread via GitHub
danny0405 commented on issue #8268: URL: https://github.com/apache/hudi/issues/8268#issuecomment-1482170915 The message with a null primary key means nothing, the storage engine can not handle the message if the key is missed. Because usually a delete message indicates a retraction action f

[GitHub] [hudi] danny0405 closed issue #8195: [SUPPORT] Clustering is not happening on Flink Hudi

2023-03-23 Thread via GitHub
danny0405 closed issue #8195: [SUPPORT] Clustering is not happening on Flink Hudi URL: https://github.com/apache/hudi/issues/8195 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [hudi] danny0405 commented on issue #8273: [SUPPORT] How to connect Hudi cli to MinIO

2023-03-23 Thread via GitHub
danny0405 commented on issue #8273: URL: https://github.com/apache/hudi/issues/8273#issuecomment-1482158363 > Caused by: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider

[GitHub] [hudi] danny0405 commented on pull request #8029: [HUDI-5832] add relocated prefix for hbase classes in hbase-site.xml

2023-03-23 Thread via GitHub
danny0405 commented on PR #8029: URL: https://github.com/apache/hudi/pull/8029#issuecomment-1482156056 There is a failue for timeout: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=15364&view=logs&j=7601efb9-4019-552e-11ba-eb31b66593b2&t=9688f101-287d-53f4-2a8

[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

2023-03-23 Thread via GitHub
danny0405 commented on code in PR #8219: URL: https://github.com/apache/hudi/pull/8219#discussion_r1147049916 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -150,20 +150,7 @@ object HoodieSparkSqlWriter { case _

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-23 Thread via GitHub
Zouxxyy commented on code in PR #8277: URL: https://github.com/apache/hudi/pull/8277#discussion_r1147036589 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java: ## @@ -168,8 +170,14 @@ protected ClosableIterator> deserializeRecords(byt

[GitHub] [hudi] DavidZ1 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?

2023-03-23 Thread via GitHub
DavidZ1 commented on issue #8267: URL: https://github.com/apache/hudi/issues/8267#issuecomment-1482134241 Thank you for your answer. I use the `HUDI CLI` tool to view the compact execution process of the table, and found that many compacts are `inflight`. Does it mean that the async

[GitHub] [hudi] weimingdiit commented on a diff in pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2023-03-23 Thread via GitHub
weimingdiit commented on code in PR #7362: URL: https://github.com/apache/hudi/pull/7362#discussion_r1147026350 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java: ## @@ -391,6 +392,11 @@ protected static boolean hasMetaFields(Schema schema) { retur

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1482105979 ## CI report: * e68bfec947db2d785523def84d5e8e1cae9814ba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[GitHub] [hudi] hudi-bot commented on pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8275: URL: https://github.com/apache/hudi/pull/8275#issuecomment-1482078677 ## CI report: * 8b4506ef3a48e1c44e20153e73f374f83df22095 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8272: URL: https://github.com/apache/hudi/pull/8272#issuecomment-1482023828 ## CI report: * 757ff2448d8e19b19be316803fd29a9c89a747bb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[GitHub] [hudi] hudi-bot commented on pull request #8231: [HUDI-5963] Release 0.13.1 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8231: URL: https://github.com/apache/hudi/pull/8231#issuecomment-1481992628 ## CI report: * 5766f144d33eb48e7ec09bcaaf3a3767c5029b17 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1481977803 ## CI report: * e7dbc8406459a6b1eddea58123c8324a5ef370c0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1481969546 ## CI report: * e7dbc8406459a6b1eddea58123c8324a5ef370c0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[GitHub] [hudi] nsivabalan closed issue #8234: [ISSUE] Clustering always rewrites the file even when there is nothing to cluster it with

2023-03-23 Thread via GitHub
nsivabalan closed issue #8234: [ISSUE] Clustering always rewrites the file even when there is nothing to cluster it with URL: https://github.com/apache/hudi/issues/8234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [hudi] nsivabalan commented on issue #8234: [ISSUE] Clustering always rewrites the file even when there is nothing to cluster it with

2023-03-23 Thread via GitHub
nsivabalan commented on issue #8234: URL: https://github.com/apache/hudi/issues/8234#issuecomment-1481926716 thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] hudi-bot commented on pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8275: URL: https://github.com/apache/hudi/pull/8275#issuecomment-1481924620 ## CI report: * 00f671c8c0ee475c527bd28cf1622199d831072c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[GitHub] [hudi] nsivabalan commented on issue #8236: [SUPPORT]Duplicate data in MOR table Hudi

2023-03-23 Thread via GitHub
nsivabalan commented on issue #8236: URL: https://github.com/apache/hudi/issues/8236#issuecomment-1481924602 There could be two reasons: 1. you may need to set hoodie.datasource.write.streaming.ignore.failed.batch = false. (default value is false). 2. we found a corner case w/ metadat

[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8272: URL: https://github.com/apache/hudi/pull/8272#issuecomment-1481924563 ## CI report: * e9169777d9f375e370ce44e0545eebd7d984ce22 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1586

[GitHub] [hudi] hudi-bot commented on pull request #8275: [HUDI-5289] Avoiding repeated trigger of clustering dag

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8275: URL: https://github.com/apache/hudi/pull/8275#issuecomment-1481914255 ## CI report: * 00f671c8c0ee475c527bd28cf1622199d831072c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[GitHub] [hudi] hudi-bot commented on pull request #8272: use path similar to base file when config is true

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8272: URL: https://github.com/apache/hudi/pull/8272#issuecomment-1481914159 ## CI report: * e9169777d9f375e370ce44e0545eebd7d984ce22 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1586

[GitHub] [hudi] nsivabalan commented on issue #8261: [SUPPORT] How to reduce hoodie commit latency

2023-03-23 Thread via GitHub
nsivabalan commented on issue #8261: URL: https://github.com/apache/hudi/issues/8261#issuecomment-1481913996 here is the fix: https://github.com/apache/hudi/pull/7561 that went into 0.13.0. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1481903527 ## CI report: * e7dbc8406459a6b1eddea58123c8324a5ef370c0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1588

[hudi] branch asf-site updated: [DOCS] Adding faq on GCS issue w/ writes (#8090)

2023-03-23 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 1927d78a944 [DOCS] Adding faq on GCS issue

[GitHub] [hudi] nsivabalan merged pull request #8090: [DOCS] Adding faq on GCS issue w/ writes

2023-03-23 Thread via GitHub
nsivabalan merged PR #8090: URL: https://github.com/apache/hudi/pull/8090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1481819938 ## CI report: * 7f48a5ab4c17103cbd01f655d343adfab56c7655 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[hudi] branch asf-site updated: [DOCS][MINOR] Fix community sync schedule (#8262)

2023-03-23 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 584a5b1b1df [DOCS][MINOR] Fix community syn

[GitHub] [hudi] nsivabalan merged pull request #8262: [DOCS][MINOR] Fix community sync schedule

2023-03-23 Thread via GitHub
nsivabalan merged PR #8262: URL: https://github.com/apache/hudi/pull/8262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] stathismar commented on issue #8278: [SUPPORT] Deltastreamer Fails with AWSDmsAvroPayload

2023-03-23 Thread via GitHub
stathismar commented on issue #8278: URL: https://github.com/apache/hudi/issues/8278#issuecomment-1481713351 I was about to report exactly the same issue. The problem has to do with the deletes. By the time `DeltaStreamer` parses a `DELETE` record in DMS `.parquet` files, I face the error

[GitHub] [hudi] hudi-bot commented on pull request #8231: [HUDI-5963] Release 0.13.1 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8231: URL: https://github.com/apache/hudi/pull/8231#issuecomment-1481674659 ## CI report: * Unknown: [CANCELED](TBD) * 5766f144d33eb48e7ec09bcaaf3a3767c5029b17 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039

[GitHub] [hudi] jdattani opened a new issue, #5451: [SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException

2023-03-23 Thread via GitHub
jdattani opened a new issue, #5451: URL: https://github.com/apache/hudi/issues/5451 **Describe the problem you faced** Using DynamoDB as the lock provider for concurrent writes results in an error stating java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNot

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1481674879 ## CI report: * Unknown: [CANCELED](TBD) * e7dbc8406459a6b1eddea58123c8324a5ef370c0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039

[GitHub] [hudi] hudi-bot commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1481653616 ## CI report: * Unknown: [CANCELED](TBD) * e7dbc8406459a6b1eddea58123c8324a5ef370c0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `

[GitHub] [hudi] hudi-bot commented on pull request #8231: [HUDI-5963] Release 0.13.1 prep

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8231: URL: https://github.com/apache/hudi/pull/8231#issuecomment-1481652924 ## CI report: * Unknown: [CANCELED](TBD) * 5766f144d33eb48e7ec09bcaaf3a3767c5029b17 UNKNOWN Bot commands @hudi-bot supports the following commands: - `

[GitHub] [hudi] chenbodeng719 opened a new issue, #8279: [SUPPORT]I use flink to bulk insert a mor table with bucket index. But it seems that there is only log file without parquet?

2023-03-23 Thread via GitHub
chenbodeng719 opened a new issue, #8279: URL: https://github.com/apache/hudi/issues/8279 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? It seems that the pagee is 404. - Join the mailing list to engage in conversations a

[GitHub] [hudi] jtmzheng commented on issue #5451: [SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException

2023-03-23 Thread via GitHub
jtmzheng commented on issue #5451: URL: https://github.com/apache/hudi/issues/5451#issuecomment-1481615398 > @kazdy @yihua I tried including hudi-aws jar to Glue Dependent JARs path. But still getting the exact same error. Is there anything else I can try? This seems currently bro

[GitHub] [hudi] nsivabalan commented on pull request #8270: [HUDI-5975] Release 0.12.3 prep

2023-03-23 Thread via GitHub
nsivabalan commented on PR #8270: URL: https://github.com/apache/hudi/pull/8270#issuecomment-1481601731 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] nsivabalan commented on pull request #8231: [HUDI-5963] Release 0.13.1 prep

2023-03-23 Thread via GitHub
nsivabalan commented on PR #8231: URL: https://github.com/apache/hudi/pull/8231#issuecomment-1481601513 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] Hans-Raintree opened a new issue, #8278: [SUPPORT] Deltastreamer Fails with AWSDmsAvroPayload

2023-03-23 Thread via GitHub
Hans-Raintree opened a new issue, #8278: URL: https://github.com/apache/hudi/issues/8278 **Describe the problem you faced** Deltastreamer ingest fails with AWSDmsAvroPayload, works without with identical configuration. **To Reproduce** Steps to reproduce the behavior:

[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1481586400 ## CI report: * 7f48a5ab4c17103cbd01f655d343adfab56c7655 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[GitHub] [hudi] hudi-bot commented on pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8277: URL: https://github.com/apache/hudi/pull/8277#issuecomment-1481573275 ## CI report: * 7f48a5ab4c17103cbd01f655d343adfab56c7655 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-5976) Add fs in the constructor of HoodieAvroHFileReader to avoid potential NPE

2023-03-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5976: - Labels: pull-request-available (was: ) > Add fs in the constructor of HoodieAvroHFileReader to av

[GitHub] [hudi] codope commented on a diff in pull request #8277: [HUDI-5976] Add fs in the constructor of HoodieAvroHFileReader

2023-03-23 Thread via GitHub
codope commented on code in PR #8277: URL: https://github.com/apache/hudi/pull/8277#discussion_r1146494242 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieHFileDataBlock.java: ## @@ -168,8 +170,14 @@ protected ClosableIterator> deserializeRecords(byte

[jira] [Created] (HUDI-5976) Add fs in the constructor of HoodieAvroHFileReader to avoid potential NPE

2023-03-23 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-5976: - Summary: Add fs in the constructor of HoodieAvroHFileReader to avoid potential NPE Key: HUDI-5976 URL: https://issues.apache.org/jira/browse/HUDI-5976 Project: Apache Hudi

[GitHub] [hudi] codope commented on issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is null,resulting in a NullPointerException

2023-03-23 Thread via GitHub
codope commented on issue #8257: URL: https://github.com/apache/hudi/issues/8257#issuecomment-1481542479 @Zouxxyy thanks for working on it. Will review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] Zouxxyy opened a new pull request, #8277: [MINOR] Add fs in the constructor of HoodieAvroHFileReader

2023-03-23 Thread via GitHub
Zouxxyy opened a new pull request, #8277: URL: https://github.com/apache/hudi/pull/8277 ### Change Logs for https://github.com/apache/hudi/issues/8257 After hbase 2.4.13, the constructor of `ReaderContext` has changed, and a configuration needs to be fetched through hfs, B

[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

2023-03-23 Thread via GitHub
wecharyu commented on code in PR #8219: URL: https://github.com/apache/hudi/pull/8219#discussion_r1146470877 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -150,20 +150,7 @@ object HoodieSparkSqlWriter { case _

[GitHub] [hudi] Zouxxyy commented on issue #8257: [SUPPORT]HoodieDeltaStreamer (0.13.0 ),FileSystem is null,resulting in a NullPointerException

2023-03-23 Thread via GitHub
Zouxxyy commented on issue #8257: URL: https://github.com/apache/hudi/issues/8257#issuecomment-1481481964 @LiJie20190102 The reason why your task will fail is that the hbase you are using is 2.4.13, the constructor of ReaderContext has changed, and a configuration needs to be fetche

[GitHub] [hudi] WarFox commented on issue #8258: [SUPPORT] Deltastreamer errors ingesting kafka-streams topics (transactional / exactly_once producers)

2023-03-23 Thread via GitHub
WarFox commented on issue #8258: URL: https://github.com/apache/hudi/issues/8258#issuecomment-1481350315 More observations on this - It is observed that the problem occurs when transaction marker is at the last offset for consumption - The ingestion is successful when we have non-t

[GitHub] [hudi] stream2000 commented on a diff in pull request #7826: [HUDI-5675] fix lazy clean schedule rollback on completed instant

2023-03-23 Thread via GitHub
stream2000 commented on code in PR #7826: URL: https://github.com/apache/hudi/pull/7826#discussion_r1146292616 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java: ## @@ -707,20 +709,37 @@ protected List getInstantsToRollback

[GitHub] [hudi] hudi-bot commented on pull request #8184: [HUDI-5780] Refactor Deltastreamer source configs to use HoodieConfig/ConfigProperty

2023-03-23 Thread via GitHub
hudi-bot commented on PR #8184: URL: https://github.com/apache/hudi/pull/8184#issuecomment-1481152011 ## CI report: * 7674fe343bb2209f6a969f4c77e2f784ee8efc29 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1587

[GitHub] [hudi] stayrascal commented on pull request #8029: [HUDI-5832] add relocated prefix for hbase classes in hbase-site.xml

2023-03-23 Thread via GitHub
stayrascal commented on PR #8029: URL: https://github.com/apache/hudi/pull/8029#issuecomment-1481071392 > @stayrascal could you check the CI failure? @yihua Thanks a lot for review this PR. may I check which CI failed? it seems that all CI passed? -- This is an automated message fr

[GitHub] [hudi] pramodbiligiri commented on pull request #7963: [HUDI-5804] Enable the flags in CommitsCommand that were suppressed by mistake

2023-03-23 Thread via GitHub
pramodbiligiri commented on PR #7963: URL: https://github.com/apache/hudi/pull/7963#issuecomment-1481025840 @bvaradar Have merged latest master into this branch and build is passing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [hudi] jiangxinqi1995 opened a new issue, #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-23 Thread via GitHub
jiangxinqi1995 opened a new issue, #8276: URL: https://github.com/apache/hudi/issues/8276 **Describe the problem you faced** A clear and concise description of the problem. "I use Flink cdc to read MySQL data, and then write it to S3 through hudi. I often encounter checkpoint org

  1   2   >