[GitHub] [hudi] yihua commented on a diff in pull request #5454: [HUDI-3680][HUDI-3926] Update docs for Spark, utilities, and utilities-slim bundles

2022-04-28 Thread GitBox
yihua commented on code in PR #5454: URL: https://github.com/apache/hudi/pull/5454#discussion_r861476232 ## website/docs/quick-start-guide.md: ## @@ -41,24 +51,24 @@ values={[ From the extracted directory run spark-shell with Hudi as: ```scala -// spark-shell for spark 3.1

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-28 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1112883931 @xiarixiaoyao We did another test, we used this JSON string

[jira] [Created] (HUDI-3997) Release 0.11.0 docs update

2022-04-28 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-3997: Summary: Release 0.11.0 docs update Key: HUDI-3997 URL: https://issues.apache.org/jira/browse/HUDI-3997 Project: Apache Hudi Issue Type: Task Components:

[GitHub] [hudi] hudi-bot commented on pull request #5185: [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index

2022-04-28 Thread GitBox
hudi-bot commented on PR #5185: URL: https://github.com/apache/hudi/pull/5185#issuecomment-1112864896 ## CI report: * 6a052dfd7cef5c935b6093c381a15bc4b83d43ab Azure:

[GitHub] [hudi] santoshsb commented on issue #5452: Schema Evolution: Missing column for previous records when new entry does not have the same while upsert.

2022-04-28 Thread GitBox
santoshsb commented on issue #5452: URL: https://github.com/apache/hudi/issues/5452#issuecomment-1112864560 Hi @xiarixiaoyao, thanks for the code. It worked like a charm for the reduced json as provided above. After successfully testing it with the reduced schema, we used the complete

[GitHub] [hudi] hudi-bot commented on pull request #5185: [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index

2022-04-28 Thread GitBox
hudi-bot commented on PR #5185: URL: https://github.com/apache/hudi/pull/5185#issuecomment-1112863556 ## CI report: * 6a052dfd7cef5c935b6093c381a15bc4b83d43ab Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-28 Thread GitBox
hudi-bot commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112862396 ## CI report: * 44bcae973cc953d48d1cee358c19cf7857504a3b Azure:

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
nsivabalan commented on code in PR #5462: URL: https://github.com/apache/hudi/pull/5462#discussion_r861390575 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/RowKeyGeneratorHelper.java: ## @@ -234,13 +237,14 @@ public static Object getNestedFieldVal(Row

[GitHub] [hudi] JerryYue-M commented on pull request #5445: [HUDI-3953]Flink Hudi module should support low-level source and sink…

2022-04-28 Thread GitBox
JerryYue-M commented on PR #5445: URL: https://github.com/apache/hudi/pull/5445#issuecomment-1112838551 > @danny0405 there are some question for the API patch 1. DDL needs a type for each column. So users must provide a string type to make > > Can you also apply

[GitHub] [hudi] xicm commented on pull request #5286: HUDI-3836 Improve the way of fetching metadata partitions from table

2022-04-28 Thread GitBox
xicm commented on PR #5286: URL: https://github.com/apache/hudi/pull/5286#issuecomment-1112838118 It seems that ITTestHoodieDataSource.testWriteAndReadDebeziumJson is flaky, Should I fix it here or open another issue? -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] wxplovecc commented on a diff in pull request #5185: [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index

2022-04-28 Thread GitBox
wxplovecc commented on code in PR #5185: URL: https://github.com/apache/hudi/pull/5185#discussion_r861434321 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java: ## @@ -185,5 +209,30 @@ private void

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112833638 ## CI report: * 581efe2dd0903441afdda8e1c57cce51365e2a96 Azure:

[GitHub] [hudi] YannByron commented on pull request #5436: [RFC-51] [HUDI-3478] Change Data Capture RFC

2022-04-28 Thread GitBox
YannByron commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1112830924 > I learned that delta lake also extracted the CDF for processing. I think that the CDF can be extracted to better control the data and specifications in the CDC. I also read iceberg's

[GitHub] [hudi] zhilinli123 commented on issue #5460: org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error

2022-04-28 Thread GitBox
zhilinli123 commented on issue #5460: URL: https://github.com/apache/hudi/issues/5460#issuecomment-1112830605 > @zhilinli123 by "compression failure", do you mean the Hudi compaction fails and `20220428191757755` is a scheduled compaction? Have you tried to restart and retry the Flink job

[GitHub] [hudi] zhilinli123 commented on issue #5460: org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error

2022-04-28 Thread GitBox
zhilinli123 commented on issue #5460: URL: https://github.com/apache/hudi/issues/5460#issuecomment-1112830030 > @zhilinli123 by "compression failure", do you mean the Hudi compaction fails and `20220428191757755` is a scheduled compaction? Have you tried to restart and retry the Flink job

[GitHub] [hudi] zhilinli123 commented on issue #5460: org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error

2022-04-28 Thread GitBox
zhilinli123 commented on issue #5460: URL: https://github.com/apache/hudi/issues/5460#issuecomment-1112829289 > @ > @zhilinli123为了帮助您确定根本原因,您能否提供重现问题的步骤/命令,包括如何编写多个表?您能否还显示`.hoodie`文件夹中的完整时间线,包括`20220428191757755`引发错误的时间线?您是否在同一张表上运行了任何并发编写器或 aysnc 表服务? 我当前是将Mysql

svn commit: r54162 - in /release/hudi/0.11.0: ./ hudi-0.11.0.src.tgz hudi-0.11.0.src.tgz.asc hudi-0.11.0.src.tgz.sha512

2022-04-28 Thread xushiyan
Author: xushiyan Date: Fri Apr 29 02:17:10 2022 New Revision: 54162 Log: Add hudi 0.11.0 Added: release/hudi/0.11.0/ release/hudi/0.11.0/hudi-0.11.0.src.tgz (with props) release/hudi/0.11.0/hudi-0.11.0.src.tgz.asc release/hudi/0.11.0/hudi-0.11.0.src.tgz.sha512 Added:

[GitHub] [hudi] todd5167 commented on issue #5395: [SUPPORT] Failed to archive commits, thow error 'Directory is not empty'

2022-04-28 Thread GitBox
todd5167 commented on issue #5395: URL: https://github.com/apache/hudi/issues/5395#issuecomment-1112815298 > So you do not write into same table from two separate Flink jobs. yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] hudi-bot commented on pull request #5434: [HUDI-3978] Fix use of partition path field as hive partition field in flink

2022-04-28 Thread GitBox
hudi-bot commented on PR #5434: URL: https://github.com/apache/hudi/pull/5434#issuecomment-1112814736 ## CI report: * Unknown: [CANCELED](TBD) * 3d16bea801086257ae640e5b8cef7a669dc51ad9 Azure:

[GitHub] [hudi] YuangZhang commented on issue #5457: [SUPPORT] org.apache.hudi.common.fs.HoodieWrapperFileSystem cannot be cast to org.apache.hudi.common.fs.HoodieWrapperFileSystem

2022-04-28 Thread GitBox
YuangZhang commented on issue #5457: URL: https://github.com/apache/hudi/issues/5457#issuecomment-1112814459 I have encountered such errors,this is because different classloader loads a same class,you need check you library -- This is an automated message from the Apache Git Service. To

svn commit: r54161 - in /dev/hudi/hudi-0.11.0: ./ hudi-0.11.0.src.tgz hudi-0.11.0.src.tgz.asc hudi-0.11.0.src.tgz.sha512

2022-04-28 Thread xushiyan
Author: xushiyan Date: Fri Apr 29 02:05:39 2022 New Revision: 54161 Log: Add hudi release 0.11.0 Added: dev/hudi/hudi-0.11.0/ dev/hudi/hudi-0.11.0/hudi-0.11.0.src.tgz (with props) dev/hudi/hudi-0.11.0/hudi-0.11.0.src.tgz.asc dev/hudi/hudi-0.11.0/hudi-0.11.0.src.tgz.sha512

[GitHub] [hudi] hudi-bot commented on pull request #5434: [HUDI-3978] Fix use of partition path field as hive partition field in flink

2022-04-28 Thread GitBox
hudi-bot commented on PR #5434: URL: https://github.com/apache/hudi/pull/5434#issuecomment-1112813546 ## CI report: * Unknown: [CANCELED](TBD) * 3d16bea801086257ae640e5b8cef7a669dc51ad9 UNKNOWN Bot commands @hudi-bot supports the following commands: -

[hudi] annotated tag release-0.11.0 updated (8d6d9b98ac -> a0d25e5da9)

2022-04-28 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to annotated tag release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.11.0 was modified! *** from 8d6d9b98ac (commit) to a0d25e5da9 (tag)

[GitHub] [hudi] onlywangyh commented on pull request #5434: [HUDI-3978] Fix use of partition path field as hive partition field in flink

2022-04-28 Thread GitBox
onlywangyh commented on PR #5434: URL: https://github.com/apache/hudi/pull/5434#issuecomment-1112812588 @hudi-bot run azur -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] hudi-bot commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-28 Thread GitBox
hudi-bot commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112812272 ## CI report: * 44bcae973cc953d48d1cee358c19cf7857504a3b Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #5185: [HUDI-3758] Fix duplicate fileId error in MOR table type with flink bucket hash Index

2022-04-28 Thread GitBox
danny0405 commented on code in PR #5185: URL: https://github.com/apache/hudi/pull/5185#discussion_r857474363 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java: ## @@ -185,5 +209,30 @@ private void

[hudi] branch master updated (4e928a6fe1 -> b27e8b51d8)

2022-04-28 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 4e928a6fe1 [HUDI-3943] Some description fixes for 0.10.1 docs (#5447) add b27e8b51d8 [MINOR] support different

[GitHub] [hudi] danny0405 merged pull request #5459: [MINOR] support different cleaning policy for flink

2022-04-28 Thread GitBox
danny0405 merged PR #5459: URL: https://github.com/apache/hudi/pull/5459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] danny0405 commented on pull request #5434: [HUDI-3978] Fix use of partition path field as hive partition field in flink

2022-04-28 Thread GitBox
danny0405 commented on PR #5434: URL: https://github.com/apache/hudi/pull/5434#issuecomment-1112805876 ![image](https://user-images.githubusercontent.com/7644508/165873069-0dce683c-8885-4d8f-adba-d45c5a193657.png) Please fix the build failure, you can rebase the latest master and

[hudi] branch release-0.11.0 updated: [MINOR] Update release version to reflect published version 0.11.0

2022-04-28 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/release-0.11.0 by this push: new 8d6d9b98ac [MINOR] Update

[GitHub] [hudi] dongkelun commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-28 Thread GitBox
dongkelun commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112799961 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] danny0405 commented on issue #5460: org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error

2022-04-28 Thread GitBox
danny0405 commented on issue #5460: URL: https://github.com/apache/hudi/issues/5460#issuecomment-1112799467 > Based on the current information, it's likely that the same marker is created twice and the second attempt fails. @danny0405 do you know if Flink has special handling around

[hudi] branch asf-site updated (82db2d55e6 -> 7abadad795)

2022-04-28 Thread github-bot
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git from 82db2d55e6 [HUDI-3928][HUDI-3932] Adding docs for 0.11 release (savepoint restore to CLI, pulsar commit

[GitHub] [hudi] nsivabalan commented on pull request #5449: [HUDI-3911][DOCS][WIP] Add async indexing doc

2022-04-28 Thread GitBox
nsivabalan commented on PR #5449: URL: https://github.com/apache/hudi/pull/5449#issuecomment-1112798919 I like the motivation section  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

2022-04-28 Thread GitBox
nsivabalan commented on PR #5341: URL: https://github.com/apache/hudi/pull/5341#issuecomment-1112798014 @alexeykudinkin : can you review this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5440: [HUDI-3930][Docs] Adding documentation around Data Skipping

2022-04-28 Thread GitBox
nsivabalan commented on code in PR #5440: URL: https://github.com/apache/hudi/pull/5440#discussion_r861410189 ## website/docs/performance.md: ## @@ -60,25 +62,48 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event **~7X (2880 secs vs 440 secs)

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5440: [HUDI-3930][Docs] Adding documentation around Data Skipping

2022-04-28 Thread GitBox
nsivabalan commented on code in PR #5440: URL: https://github.com/apache/hudi/pull/5440#discussion_r861409940 ## website/docs/performance.md: ## @@ -60,25 +62,48 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event **~7X (2880 secs vs 440 secs)

[hudi] branch asf-site updated: [HUDI-3928][HUDI-3932] Adding docs for 0.11 release (savepoint restore to CLI, pulsar commit callback, hive schema provider) (#5429)

2022-04-28 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 82db2d55e6 [HUDI-3928][HUDI-3932] Adding

[GitHub] [hudi] nsivabalan merged pull request #5429: [HUDI-3928][HUDI-3932] Adding docs for 0.11 release (savepoint restore to CLI, pulsar commit callback, hive schema provider)

2022-04-28 Thread GitBox
nsivabalan merged PR #5429: URL: https://github.com/apache/hudi/pull/5429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5440: [HUDI-3930][Docs] Adding documentation around Data Skipping

2022-04-28 Thread GitBox
nsivabalan commented on code in PR #5440: URL: https://github.com/apache/hudi/pull/5440#discussion_r861409547 ## website/docs/performance.md: ## @@ -60,25 +62,48 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event **~7X (2880 secs vs 440 secs)

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5440: [HUDI-3930][Docs] Adding documentation around Data Skipping

2022-04-28 Thread GitBox
nsivabalan commented on code in PR #5440: URL: https://github.com/apache/hudi/pull/5440#discussion_r861409033 ## website/docs/performance.md: ## @@ -60,25 +62,48 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event **~7X (2880 secs vs 440 secs)

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112792458 ## CI report: * 245cb3f075741bd6af58919df1554b82e888985b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112790786 ## CI report: * 245cb3f075741bd6af58919df1554b82e888985b Azure:

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
alexeykudinkin commented on code in PR #5462: URL: https://github.com/apache/hudi/pull/5462#discussion_r861404133 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java: ## @@ -57,18 +61,18 @@ public class

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
alexeykudinkin commented on code in PR #5462: URL: https://github.com/apache/hudi/pull/5462#discussion_r861404133 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java: ## @@ -57,18 +61,18 @@ public class

[hudi] branch asf-site updated: [DOCS] Add faq for async/offline compaction options (#5304)

2022-04-28 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 6587cad67c [DOCS] Add faq for

[GitHub] [hudi] nsivabalan merged pull request #5304: [DOCS] Add faq for async compaction options

2022-04-28 Thread GitBox
nsivabalan merged PR #5304: URL: https://github.com/apache/hudi/pull/5304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] nsivabalan commented on pull request #5429: [HUDI-3928][HUDI-3932] Adding docs for 0.11 release (savepoint restore to CLI, pulsar commit callback, hive schema provider)

2022-04-28 Thread GitBox
nsivabalan commented on PR #5429: URL: https://github.com/apache/hudi/pull/5429#issuecomment-1112779608 addressed comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[jira] [Updated] (HUDI-3996) Revisit default key gen logic with spark-sql

2022-04-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3996: -- Fix Version/s: 0.12.0 > Revisit default key gen logic with spark-sql >

[jira] [Created] (HUDI-3996) Revisit default key gen logic with spark-sql

2022-04-28 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3996: - Summary: Revisit default key gen logic with spark-sql Key: HUDI-3996 URL: https://issues.apache.org/jira/browse/HUDI-3996 Project: Apache Hudi

[jira] [Assigned] (HUDI-3996) Revisit default key gen logic with spark-sql

2022-04-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3996: - Assignee: Raymond Xu > Revisit default key gen logic with spark-sql >

[GitHub] [hudi] yihua commented on issue #5451: [SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException

2022-04-28 Thread GitBox
yihua commented on issue #5451: URL: https://github.com/apache/hudi/issues/5451#issuecomment-1112757926 @jdattani Could you add the Hudi config below, bumping the retries, and try again? It is likely due to transient error. ``` hoodie.write.lock.client.num_retries=10 ``` --

[GitHub] [hudi] yihua commented on issue #5451: [SUPPORT] Hudi 0.10.1 raises exception java.lang.NoClassDefFoundError: com/amazonaws/services/dynamodbv2/model/LockNotGrantedException

2022-04-28 Thread GitBox
yihua commented on issue #5451: URL: https://github.com/apache/hudi/issues/5451#issuecomment-1112747391 @umehrot2 Do you know the right setup for using DynamoDB as the lock provider? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] bhasudha commented on pull request #5440: [HUDI-3930][Docs] Adding documentation around Data Skipping

2022-04-28 Thread GitBox
bhasudha commented on PR #5440: URL: https://github.com/apache/hudi/pull/5440#issuecomment-1112740346 @alexeykudinkin I have few thoughts. So far we havent have had much going on in the queries side. So we dont have an explicit section on that per say. Should we move/copy this content to

[jira] [Updated] (HUDI-3930) Add guide page for data-skipping

2022-04-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3930: - Labels: pull-request-available (was: ) > Add guide page for data-skipping >

[GitHub] [hudi] bhasudha commented on a diff in pull request #5440: [HUDI-3930][Docs] Adding documentation around Data Skipping

2022-04-28 Thread GitBox
bhasudha commented on code in PR #5440: URL: https://github.com/apache/hudi/pull/5440#discussion_r861374330 ## website/docs/performance.md: ## @@ -60,25 +62,48 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event **~7X (2880 secs vs 440 secs)

[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3994: Fix Version/s: 0.12.0 (was: 0.11.0) > HoodieDeltaStreamer - Spark master shouldn't

[GitHub] [hudi] yihua closed issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

2022-04-28 Thread GitBox
yihua closed issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue) URL: https://github.com/apache/hudi/issues/5456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] yihua commented on issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

2022-04-28 Thread GitBox
yihua commented on issue #5456: URL: https://github.com/apache/hudi/issues/5456#issuecomment-1112730279 Got you. Sg. Let's track the progress there. Closing this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112730028 ## CI report: * 245cb3f075741bd6af58919df1554b82e888985b Azure:

[jira] [Assigned] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-3994: --- Assignee: (was: Ethan Guo) > HoodieDeltaStreamer - Spark master shouldn't have a default >

[jira] [Assigned] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-3994: --- Assignee: Ethan Guo > HoodieDeltaStreamer - Spark master shouldn't have a default >

[GitHub] [hudi] yihua commented on issue #5280: [SUPPORT] Docker Demo: Failed to Connect to namenode

2022-04-28 Thread GitBox
yihua commented on issue #5280: URL: https://github.com/apache/hudi/issues/5280#issuecomment-1112728730 @arunb2w If you don't need Hadoop or Hive specifically, you can compile Hudi jars and use Spark to write Hudi table to the local file system. Spark should run on any platform that runs

[GitHub] [hudi] bhasudha commented on pull request #5449: [HUDI-3911][DOCS][WIP] Add async indexing doc

2022-04-28 Thread GitBox
bhasudha commented on PR #5449: URL: https://github.com/apache/hudi/pull/5449#issuecomment-1112726886 @codope I read though the blog. Couple thoughts/questions from an external perspective. - Should we name this blog to explicitly mentions as Asynchronous metadata indexing using Hudi.

[GitHub] [hudi] yihua commented on issue #5455: [SUPPORT] Read Hudi Table from Hive/Glue Catalog without specifying the S3 Path

2022-04-28 Thread GitBox
yihua commented on issue #5455: URL: https://github.com/apache/hudi/issues/5455#issuecomment-1112725379 @umehrot2 could you shed light on this as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
nsivabalan commented on code in PR #5462: URL: https://github.com/apache/hudi/pull/5462#discussion_r861359728 ## hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/HoodieDatasetBulkInsertHelper.java: ## @@ -57,18 +61,18 @@ public class

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
nsivabalan commented on code in PR #5462: URL: https://github.com/apache/hudi/pull/5462#discussion_r861359423 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -386,12 +386,14 @@ public void validateTableProperties(Properties

[GitHub] [hudi] bhasudha commented on a diff in pull request #5449: [HUDI-3911][DOCS][WIP] Add async indexing doc

2022-04-28 Thread GitBox
bhasudha commented on code in PR #5449: URL: https://github.com/apache/hudi/pull/5449#discussion_r861350782 ## website/blog/2022-04-27-async-indexing.md: ## @@ -0,0 +1,213 @@ +--- +title: "Asynchronous Indexing using Hudi" +excerpt: "How to setup Hudi for asynchronous indexing"

[hudi] branch master updated (52953c8f5e -> 4e928a6fe1)

2022-04-28 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 52953c8f5e [HUDI-3815] Fix docs description of metadata.compaction.delta_commits default value error (#5368)

[GitHub] [hudi] bhasudha merged pull request #5447: [HUDI-3943] Some description fixes for 0.10.1 docs

2022-04-28 Thread GitBox
bhasudha merged PR #5447: URL: https://github.com/apache/hudi/pull/5447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] bhasudha commented on a diff in pull request #5447: [HUDI-3943] Some description fixes for 0.10.1 docs

2022-04-28 Thread GitBox
bhasudha commented on code in PR #5447: URL: https://github.com/apache/hudi/pull/5447#discussion_r861350177 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java: ## @@ -87,7 +87,7 @@ public class HoodieClusteringConfig extends

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
alexeykudinkin commented on code in PR #5462: URL: https://github.com/apache/hudi/pull/5462#discussion_r861339922 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java: ## @@ -97,88 +98,69 @@ public String getPartitionPath(Row row) {

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112654672 ## CI report: * 0297707900e1f59d7becbd095f7b7aff8854b703 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112652237 ## CI report: * 0297707900e1f59d7becbd095f7b7aff8854b703 Azure:

[GitHub] [hudi] yihua commented on issue #5460: org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error

2022-04-28 Thread GitBox
yihua commented on issue #5460: URL: https://github.com/apache/hudi/issues/5460#issuecomment-1112635290 @zhilinli123 by "compression failure", do you mean the Hudi compaction fails and `20220428191757755` is a scheduled compaction? Have you tried to restart and retry the Flink job and see

[GitHub] [hudi] yihua commented on issue #5460: org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error

2022-04-28 Thread GitBox
yihua commented on issue #5460: URL: https://github.com/apache/hudi/issues/5460#issuecomment-1112633524 Based on the current information, it's likely that the same marker is created twice and the second attempt fails. @danny0405 do you know if Flink has special handling around creating

[GitHub] [hudi] yihua commented on issue #5460: org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error

2022-04-28 Thread GitBox
yihua commented on issue #5460: URL: https://github.com/apache/hudi/issues/5460#issuecomment-1112631217 @zhilinli123 To help you identify the root cause, could you provide the steps/commands to reproduce the issue, including how multiple tables are written? Could you also show the full

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112608124 ## CI report: * 0297707900e1f59d7becbd095f7b7aff8854b703 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
hudi-bot commented on PR #5462: URL: https://github.com/apache/hudi/pull/5462#issuecomment-1112605469 ## CI report: * 0297707900e1f59d7becbd095f7b7aff8854b703 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-3995) Improve bulk insert row writer performance for simple key gen and non partitioned key gen

2022-04-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3995: - Labels: pull-request-available (was: ) > Improve bulk insert row writer performance for simple

[GitHub] [hudi] nsivabalan opened a new pull request, #5462: [HUDI-3995] Making pref optimizations for bulk insert row writer path

2022-04-28 Thread GitBox
nsivabalan opened a new pull request, #5462: URL: https://github.com/apache/hudi/pull/5462 ## What is the purpose of the pull request - Adding few perf optimizations for bulk insert row writer. ## Brief change log - Avoid using udf for key generator for SimpleKeyGen and

[jira] [Assigned] (HUDI-3995) Improve bulk insert row writer performance for simple key gen and non partitioned key gen

2022-04-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3995: - Assignee: sivabalan narayanan > Improve bulk insert row writer performance for

[jira] [Updated] (HUDI-3995) Improve bulk insert row writer performance for simple key gen and non partitioned key gen

2022-04-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3995: -- Fix Version/s: 0.12.0 > Improve bulk insert row writer performance for simple key gen

[jira] [Updated] (HUDI-3995) Improve bulk insert row writer performance for simple key gen and non partitioned key gen

2022-04-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3995: -- Priority: Critical (was: Major) > Improve bulk insert row writer performance for

[jira] [Updated] (HUDI-3995) Improve bulk insert row writer performance for simple key gen and non partitioned key gen

2022-04-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3995: -- Description: there are some optimizations we could do on the row writer path for bulk

[jira] [Created] (HUDI-3995) Improve bulk insert row writer performance for simple key gen and non partitioned key gen

2022-04-28 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3995: - Summary: Improve bulk insert row writer performance for simple key gen and non partitioned key gen Key: HUDI-3995 URL: https://issues.apache.org/jira/browse/HUDI-3995

[GitHub] [hudi] parisni commented on issue #5363: [SUPPORT] Hudi don't propagate column comments into hive metastore / parquet files

2022-04-28 Thread GitBox
parisni commented on issue #5363: URL: https://github.com/apache/hudi/issues/5363#issuecomment-1112522324 Indeed, this is exactly what I am looking for ! thanks On Tue, 2022-04-26 at 19:33 -0700, Sivabalan Narayanan wrote: > would this work for you

[GitHub] [hudi] rahil-c commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-28 Thread GitBox
rahil-c commented on issue #5298: URL: https://github.com/apache/hudi/issues/5298#issuecomment-1112477307 @kasured Im currently looking into this and was able to reproduce the issue after running your

[GitHub] [hudi] hudi-bot commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-28 Thread GitBox
hudi-bot commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112467801 ## CI report: * 44bcae973cc953d48d1cee358c19cf7857504a3b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5434: [HUDI-3978] Fix use of partition path field as hive partition field in flink

2022-04-28 Thread GitBox
hudi-bot commented on PR #5434: URL: https://github.com/apache/hudi/pull/5434#issuecomment-1112446915 ## CI report: * f4084a5734d579888aa29a72df25de3e9521b67e Azure:

[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Conde updated HUDI-3994: --- Component/s: spark > HoodieDeltaStreamer - Spark master shouldn't have a default >

[jira] [Commented] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529535#comment-17529535 ] Angel Conde commented on HUDI-3994: Will provide a pull request of this.  > HoodieDeltaStreamer -

[jira] [Updated] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Angel Conde updated HUDI-3994: --- Status: In Progress (was: Open) > HoodieDeltaStreamer - Spark master shouldn't have a default >

[jira] [Created] (HUDI-3994) HoodieDeltaStreamer - Spark master shouldn't have a default

2022-04-28 Thread Angel Conde (Jira)
Angel Conde created HUDI-3994: -- Summary: HoodieDeltaStreamer - Spark master shouldn't have a default Key: HUDI-3994 URL: https://issues.apache.org/jira/browse/HUDI-3994 Project: Apache Hudi

[GitHub] [hudi] vingov commented on issue #5367: Hudi with DBT

2022-04-28 Thread GitBox
vingov commented on issue #5367: URL: https://github.com/apache/hudi/issues/5367#issuecomment-1112440532 @nsivabalan - It's almost done, I can send the PR tonight for the blog then I will follow up with extensive docs. -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] Neuw84 commented on issue #5456: [SUPPORT] HoodieDeltaStreamer - Spark master shouldn't have a default option (AWS Glue)

2022-04-28 Thread GitBox
Neuw84 commented on issue #5456: URL: https://github.com/apache/hudi/issues/5456#issuecomment-1112433258 Hi @yihua, The thing is that no default value should be in that class in order to be able to inherit to whatever the master is in serverless Spark engines such as AWS Glue where

[jira] [Created] (HUDI-3993) Avoid calling into Spark UDF in Bulk Insert

2022-04-28 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-3993: - Summary: Avoid calling into Spark UDF in Bulk Insert Key: HUDI-3993 URL: https://issues.apache.org/jira/browse/HUDI-3993 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-3993) Avoid calling into Spark UDF in Bulk Insert

2022-04-28 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3993: -- Labels: performance (was: ) > Avoid calling into Spark UDF in Bulk Insert >

[GitHub] [hudi] hudi-bot commented on pull request #5461: [MINOR] Fix a NPE for Option

2022-04-28 Thread GitBox
hudi-bot commented on PR #5461: URL: https://github.com/apache/hudi/pull/5461#issuecomment-1112384609 ## CI report: * 52507c1050969f26f7f70e1cf1c563c6b8573e70 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5406: [HUDI-3954] Don't keep the last commit before the earliest commit to retain

2022-04-28 Thread GitBox
hudi-bot commented on PR #5406: URL: https://github.com/apache/hudi/pull/5406#issuecomment-1112384436 ## CI report: * 44bcae973cc953d48d1cee358c19cf7857504a3b Azure:

  1   2   3   4   >