[GitHub] [hudi] hudi-bot commented on pull request #5761: [HUDI-4165] Support Create/Drop/Show/Refresh Index Syntax for Spark SQL

2022-06-06 Thread GitBox
hudi-bot commented on PR #5761: URL: https://github.com/apache/hudi/pull/5761#issuecomment-1148271102 ## CI report: * bb5c112e057214807cb4bf4979ef2084f54b19e1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9096

[jira] [Updated] (HUDI-4165) Support Create/Drop/Show/Refresh Index Syntax for Spark SQL

2022-06-06 Thread shibei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shibei updated HUDI-4165: - Summary: Support Create/Drop/Show/Refresh Index Syntax for Spark SQL (was: Support Create/Show/Drop Index for Spa

[jira] [Closed] (HUDI-4176) TableSchemaResolver fetches/parses HoodieCommitMetadata multiple times while extracting Schema

2022-06-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-4176. --- Resolution: Fixed > TableSchemaResolver fetches/parses HoodieCommitMetadata multiple times while > extracting

[jira] [Closed] (HUDI-4195) Bulk insert should not register UDFs for non-partitioned table

2022-06-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-4195. --- Resolution: Fixed > Bulk insert should not register UDFs for non-partitioned table > -

[jira] [Closed] (HUDI-4190) AuthenticationProtos not found when using kerberos authentication with hudi-spark-bundle

2022-06-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-4190. --- Resolution: Fixed > AuthenticationProtos not found when using kerberos authentication with > hudi-spark-bundl

[jira] [Updated] (HUDI-4183) Fix using HoodieCatalog to create non-hudi tables

2022-06-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4183: Fix Version/s: 0.11.1 (was: 0.12.0) > Fix using HoodieCatalog to create non-hudi tabl

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148200206 ## CI report: * 323dee0d57cfcbf5c4129ca2a680732326138e3d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9122

[GitHub] [hudi] hudi-bot commented on pull request #5768: [HUDI-4198] Fix hive config for AWSGlueClientFactory

2022-06-06 Thread GitBox
hudi-bot commented on PR #5768: URL: https://github.com/apache/hudi/pull/5768#issuecomment-1148191788 ## CI report: * 69894b3d188f230828b9da71b4a80eab42fc9c68 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9123

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148191745 ## CI report: * b20712eebaf0feaf0f26be361f4ceab86c4d86c1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9121

[GitHub] [hudi] XuQianJin-Stars commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-06-06 Thread GitBox
XuQianJin-Stars commented on issue #5765: URL: https://github.com/apache/hudi/issues/5765#issuecomment-1148185043 I also encountered this problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] YuweiXiao commented on issue #5777: [SUPPORT] Hudi table has duplicate data.

2022-06-06 Thread GitBox
YuweiXiao commented on issue #5777: URL: https://github.com/apache/hudi/issues/5777#issuecomment-1148174762 Could you share your write config, e.g., operation type and index type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [hudi] leesf commented on a diff in pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
leesf commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r890746379 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/catalog/HoodieCatalog.scala: ## @@ -105,12 +106,30 @@ class HoodieCatalog extends DelegatingCata

[GitHub] [hudi] hudi-bot commented on pull request #5768: [HUDI-4198] Fix hive config for AWSGlueClientFactory

2022-06-06 Thread GitBox
hudi-bot commented on PR #5768: URL: https://github.com/apache/hudi/pull/5768#issuecomment-1148165632 ## CI report: * 2f04c150d02b887f88e7c5e5e85d49653dd51731 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9103

[GitHub] [hudi] hudi-bot commented on pull request #5760: [HUDI-4196]support index alignment

2022-06-06 Thread GitBox
hudi-bot commented on PR #5760: URL: https://github.com/apache/hudi/pull/5760#issuecomment-1148165615 ## CI report: * 855b1c02bf95865add938fd682f47f9e8b4c18bf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9120

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148165587 ## CI report: * 986d6339fd138ec738226cd5e53ffc741578a1e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9116

[GitHub] [hudi] hudi-bot commented on pull request #5768: [HUDI-4198] Fix hive config for AWSGlueClientFactory

2022-06-06 Thread GitBox
hudi-bot commented on PR #5768: URL: https://github.com/apache/hudi/pull/5768#issuecomment-1148163813 ## CI report: * 2f04c150d02b887f88e7c5e5e85d49653dd51731 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9103

[GitHub] [hudi] alexeykudinkin commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
alexeykudinkin commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1148163515 LGTM @nsivabalan maybe in a follow-up let's add a comment to the corresponding java-docs for getKeyPrefixIterator and getKeysIterator that keys are expected to be sorted (by th

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
alexeykudinkin commented on code in PR #5773: URL: https://github.com/apache/hudi/pull/5773#discussion_r890736607 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/io/storage/TestHoodieHFileReaderWriter.java: ## @@ -316,15 +317,20 @@ public void testReaderGetRecord

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
xiarixiaoyao commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r890725117 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala: ## @@ -122,29 +122,39 @@ abstract class HoodieBaseRelation(val sqlC

[GitHub] [hudi] jjtjiang opened a new issue, #5777: [SUPPORT] Hudi table has duplicate data.

2022-06-06 Thread GitBox
jjtjiang opened a new issue, #5777: URL: https://github.com/apache/hudi/issues/5777 Hudi has duplicate data. The following data are all of the same primary key. It is said that there should be only one. Can you help me see what went wrong? the duplicate datas are as below https

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
xiarixiaoyao commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r890722348 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala: ## @@ -122,29 +122,39 @@ abstract class HoodieBaseRelation(val sqlC

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148138068 ## CI report: * 986d6339fd138ec738226cd5e53ffc741578a1e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9116

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148136491 ## CI report: * 986d6339fd138ec738226cd5e53ffc741578a1e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9116

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148134838 ## CI report: * 986d6339fd138ec738226cd5e53ffc741578a1e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9116

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-114819 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * 77be04f93ecee8d2b8fbd46aaf4ec50b2c990bae Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] BuddyJack commented on issue #5728: [SUPPORT] Flink support Timeline-server-based marker

2022-06-06 Thread GitBox
BuddyJack commented on issue #5728: URL: https://github.com/apache/hudi/issues/5728#issuecomment-1148119096 thks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [hudi] RoderickAdriance commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-06-06 Thread GitBox
RoderickAdriance commented on issue #5765: URL: https://github.com/apache/hudi/issues/5765#issuecomment-1148112582 22/06/06 22:15:27 ERROR Javalin: Exception occurred while servicing http-request java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadS

[GitHub] [hudi] hudi-bot commented on pull request #5760: [HUDI-4196]support index alignment

2022-06-06 Thread GitBox
hudi-bot commented on PR #5760: URL: https://github.com/apache/hudi/pull/5760#issuecomment-1148108801 ## CI report: * 9bdf04b0f75ab655c0972cfeaa38480a7ece2e1d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9097

[GitHub] [hudi] galain1994 commented on issue #5690: [SUPPORT] Flink stream write hudi, failed to checkpoint

2022-06-06 Thread GitBox
galain1994 commented on issue #5690: URL: https://github.com/apache/hudi/issues/5690#issuecomment-1148107966 > @galain1994 Can you see the commit file in your archieved dirs. These days, I try to change the hudi arguments with: `compaction.trigger.strategy = 'num_commits'` `'comp

[GitHub] [hudi] hudi-bot commented on pull request #5760: [HUDI-4196]support index alignment

2022-06-06 Thread GitBox
hudi-bot commented on PR #5760: URL: https://github.com/apache/hudi/pull/5760#issuecomment-1148107021 ## CI report: * 9bdf04b0f75ab655c0972cfeaa38480a7ece2e1d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9097

[GitHub] [hudi] galain1994 commented on issue #5690: [SUPPORT] Flink stream write hudi, failed to checkpoint

2022-06-06 Thread GitBox
galain1994 commented on issue #5690: URL: https://github.com/apache/hudi/issues/5690#issuecomment-1148103916 > @galain1994 Can you see the commit file in your archieved dirs. ![image](https://user-images.githubusercontent.com/10805116/172277755-59b56e9c-cd5f-4e8d-937b-f0d8ed959176.png

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1148103783 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * de800857471161785bebeb96adcbdcd4c6b09a96 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] galain1994 commented on issue #5690: [SUPPORT] Flink stream write hudi, failed to checkpoint

2022-06-06 Thread GitBox
galain1994 commented on issue #5690: URL: https://github.com/apache/hudi/issues/5690#issuecomment-1148103760 ![image](https://user-images.githubusercontent.com/10805116/172277755-59b56e9c-cd5f-4e8d-937b-f0d8ed959176.png) There are commits files in archieved dir. -- This is an automated

[GitHub] [hudi] LinMingQiang commented on issue #5774: [SUPPORT]Using flink sql to read kafka data to hudi fails

2022-06-06 Thread GitBox
LinMingQiang commented on issue #5774: URL: https://github.com/apache/hudi/issues/5774#issuecomment-1148100504 Did you open checkpoint? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[jira] [Updated] (HUDI-4186) Support Hudi with Spark 3.3

2022-06-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4186: Fix Version/s: 0.12.0 > Support Hudi with Spark 3.3 > --- > > Key: H

[GitHub] [hudi] namuny opened a new issue, #5776: [SUPPORT] Performance degradation for listing partitions

2022-06-06 Thread GitBox
namuny opened a new issue, #5776: URL: https://github.com/apache/hudi/issues/5776 I'm noticing a steep increase in duration for listing partitions during clustering, specifically after [this PR](https://github.com/apache/hudi/pull/4643) was merged. I'm yet to get to the bottom of exactly w

[GitHub] [hudi] RoderickAdriance commented on issue #5765: [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()"

2022-06-06 Thread GitBox
RoderickAdriance commented on issue #5765: URL: https://github.com/apache/hudi/issues/5765#issuecomment-1148087948 I hava this problem too.When I use hudi-delta streamer tool to extract data from mysql to hdfs. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1148082530 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * 752a5403aae52b361919cbf98639972e0fbeb5b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1148076978 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * 752a5403aae52b361919cbf98639972e0fbeb5b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] Gatsby-Lee commented on pull request #5768: [HUDI-4198] Fix hive config for AWSGlueClientFactory

2022-06-06 Thread GitBox
Gatsby-Lee commented on PR #5768: URL: https://github.com/apache/hudi/pull/5768#issuecomment-1148073037 @xushiyan tested on 0.11.0 + with this patch + Glue3. it works. can this be shipped out with 0.11.1? -- This is an automated message from the Apache Git Service. To respond to

[hudi] branch master updated: [MINOR][RFC-53] Fix typos (#5764)

2022-06-06 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4f5cad8029 [MINOR][RFC-53] Fix typos (#5764) 4f5

[GitHub] [hudi] xushiyan merged pull request #5764: [MINOR][RFC-53] Fix typos

2022-06-06 Thread GitBox
xushiyan merged PR #5764: URL: https://github.com/apache/hudi/pull/5764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.

[GitHub] [hudi] Gatsby-Lee commented on issue #5484: [SUPPORT] Hive Sync + AWS Data Catalog failling with Hudi 0.11.0

2022-06-06 Thread GitBox
Gatsby-Lee commented on issue #5484: URL: https://github.com/apache/hudi/issues/5484#issuecomment-1148065126 @xushiyan I tested 0.11.0 + https://github.com/apache/hudi/pull/5768 It works!! -- This is an automated message from the Apache Git Service. To respond to the message, please log

[hudi] branch master updated (7da97c8096 -> e5710a8e7c)

2022-06-06 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 7da97c8096 [HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747) add e5710a8e7c [MINOR] Mark

[GitHub] [hudi] xushiyan closed issue #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs

2022-06-06 Thread GitBox
xushiyan closed issue #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs URL: https://github.com/apache/hudi/issues/5736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [hudi] xushiyan merged pull request #5775: [MINOR] Mark AWSGlueCatalogSyncClient experimental

2022-06-06 Thread GitBox
xushiyan merged PR #5775: URL: https://github.com/apache/hudi/pull/5775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.

[GitHub] [hudi] xushiyan commented on issue #5736: [SUPPORT] Hudi 0.11.0 on AWS Glue: Metastore URIs

2022-06-06 Thread GitBox
xushiyan commented on issue #5736: URL: https://github.com/apache/hudi/issues/5736#issuecomment-1148058495 @eshu glad that you got it resolved. `AWSGlueCatalogSyncClient` is also experimental. I can make a note there. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] xushiyan commented on issue #5698: [SUPPORT] dependency missing when using run_sync_tool.sh

2022-06-06 Thread GitBox
xushiyan commented on issue #5698: URL: https://github.com/apache/hudi/issues/5698#issuecomment-1148056708 @omlomloml please see comments below. As for the original issue, I'll update the script then close this, since jar issue was resolved. > can't build with -Dspark3.2, this will no

[GitHub] [hudi] ykPrograWorld opened a new issue, #5774: Using flink sql to read kafka data to hudi fails

2022-06-06 Thread GitBox
ykPrograWorld opened a new issue, #5774: URL: https://github.com/apache/hudi/issues/5774 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-su

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1148048397 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * 752a5403aae52b361919cbf98639972e0fbeb5b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1148043369 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * 752a5403aae52b361919cbf98639972e0fbeb5b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1148041366 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * 752a5403aae52b361919cbf98639972e0fbeb5b6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148041310 ## CI report: * 986d6339fd138ec738226cd5e53ffc741578a1e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9116

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
alexeykudinkin commented on code in PR #5773: URL: https://github.com/apache/hudi/pull/5773#discussion_r890640414 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/io/storage/TestHoodieHFileReaderWriter.java: ## @@ -316,15 +317,20 @@ public void testReaderGetRecord

[jira] [Created] (HUDI-4202) Make sure Column Stats partition is cached after first time being read

2022-06-06 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-4202: - Summary: Make sure Column Stats partition is cached after first time being read Key: HUDI-4202 URL: https://issues.apache.org/jira/browse/HUDI-4202 Project: Apache

[jira] [Created] (HUDI-4201) Add tooling to delete empty non-completed instants from timeline

2022-06-06 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-4201: - Summary: Add tooling to delete empty non-completed instants from timeline Key: HUDI-4201 URL: https://issues.apache.org/jira/browse/HUDI-4201 Project: Apach

[GitHub] [hudi] xushiyan commented on issue #5753: [SUPPORT]Custom AbstractHiveSyncHoodieClient impl

2022-06-06 Thread GitBox
xushiyan commented on issue #5753: URL: https://github.com/apache/hudi/issues/5753#issuecomment-1148013482 mentioned in https://github.com/apache/hudi/pull/5695#issuecomment-1140601173 already. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] xushiyan closed issue #5753: [SUPPORT]Custom AbstractHiveSyncHoodieClient impl

2022-06-06 Thread GitBox
xushiyan closed issue #5753: [SUPPORT]Custom AbstractHiveSyncHoodieClient impl URL: https://github.com/apache/hudi/issues/5753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [hudi] nsivabalan commented on a diff in pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
nsivabalan commented on code in PR #5773: URL: https://github.com/apache/hudi/pull/5773#discussion_r890630049 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/io/storage/TestHoodieHFileReaderWriter.java: ## @@ -316,15 +317,20 @@ public void testReaderGetRecordIter

[GitHub] [hudi] xushiyan closed issue #5769: [SUPPORT]Spark SQL committed, Send commit event, such as SparkHudiCommittedEvent

2022-06-06 Thread GitBox
xushiyan closed issue #5769: [SUPPORT]Spark SQL committed, Send commit event, such as SparkHudiCommittedEvent URL: https://github.com/apache/hudi/issues/5769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [hudi] xushiyan commented on issue #5769: [SUPPORT]Spark SQL committed, Send commit event, such as SparkHudiCommittedEvent

2022-06-06 Thread GitBox
xushiyan commented on issue #5769: URL: https://github.com/apache/hudi/issues/5769#issuecomment-1148012311 this is achievable with `org.apache.hudi.callback.HoodieWriteCommitCallback` already. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148009242 ## CI report: * dad16d1d712576b3b92389ab6ab045dc16bdafbf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9071

[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default

2022-06-06 Thread GitBox
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1148008944 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * 4b

[GitHub] [hudi] hudi-bot commented on pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
hudi-bot commented on PR #5737: URL: https://github.com/apache/hudi/pull/5737#issuecomment-1148007337 ## CI report: * dad16d1d712576b3b92389ab6ab045dc16bdafbf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9071

[GitHub] [hudi] hudi-bot commented on pull request #5402: [WIP] Support Hadoop 3.x Hive 3.x and Spark 3.2.x default

2022-06-06 Thread GitBox
hudi-bot commented on PR #5402: URL: https://github.com/apache/hudi/pull/5402#issuecomment-1148007099 ## CI report: * 8c6f6e19940ce7ac04dfcfce52da3ccdaf3a8b0f UNKNOWN * c4799803cff8adffef56e889a5cd4d52599fcf73 UNKNOWN * c5616888bb267cb505a12b88cad3e99f9dd18d9b UNKNOWN * 53

[jira] [Updated] (HUDI-4200) Keys not sorted for records to be looked up with Metadata table base file

2022-06-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4200: - Labels: pull-request-available (was: ) > Keys not sorted for records to be looked up with Metadat

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5773: [HUDI-4200] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
alexeykudinkin commented on code in PR #5773: URL: https://github.com/apache/hudi/pull/5773#discussion_r890611911 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/io/storage/TestHoodieHFileReaderWriter.java: ## @@ -316,15 +317,20 @@ public void testReaderGetRecord

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5737: [HUDI-4178] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
alexeykudinkin commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r890608055 ## hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/hudi/ResolveHudiAlterTableCommandSpark32.scala: ## @@ -33,33 +32,37 @@ import org.apache.spa

[GitHub] [hudi] nsivabalan commented on pull request #5755: [HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw…

2022-06-06 Thread GitBox
nsivabalan commented on PR #5755: URL: https://github.com/apache/hudi/pull/5755#issuecomment-1147982961 nvm. I have put up a patch to fix it holistically https://github.com/apache/hudi/pull/5773 thanks for bringing it up. -- This is an automated message from the Apache Git Service.

[jira] [Created] (HUDI-4200) Keys not sorted for records to be looked up with Metadata table base file

2022-06-06 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-4200: - Summary: Keys not sorted for records to be looked up with Metadata table base file Key: HUDI-4200 URL: https://issues.apache.org/jira/browse/HUDI-4200 Proje

[jira] [Updated] (HUDI-4192) HoodieHFileReader scan top cells after bottom cells throw NullPointerException

2022-06-06 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4192: -- Priority: Blocker (was: Critical) > HoodieHFileReader scan top cells after bottom cells throw N

[jira] [Updated] (HUDI-4192) HoodieHFileReader scan top cells after bottom cells throw NullPointerException

2022-06-06 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4192: -- Priority: Critical (was: Minor) > HoodieHFileReader scan top cells after bottom cells throw Nul

[jira] [Closed] (HUDI-4192) HoodieHFileReader scan top cells after bottom cells throw NullPointerException

2022-06-06 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-4192. - Resolution: Fixed > HoodieHFileReader scan top cells after bottom cells throw NullPointerException

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1147975892 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * Unknown: [CANCELED](TBD) * 752a5403aae52b361919cbf98639972e0fbeb5b6 Azure: [PENDING](https://dev.azure.c

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1147973103 ## CI report: * f85071f92ef89f220112a36914572a10a940563d UNKNOWN * Unknown: [CANCELED](TBD) * 752a5403aae52b361919cbf98639972e0fbeb5b6 UNKNOWN Bot commands

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1147969811 ## CI report: * Unknown: [CANCELED](TBD) * f85071f92ef89f220112a36914572a10a940563d UNKNOWN Bot commands @hudi-bot supports the following commands: - `

[GitHub] [hudi] nsivabalan commented on pull request #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
nsivabalan commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1147967892 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1147960534 ## CI report: * 597733efd5d2c1b25b5e0b3f6a628752eff9e52c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9112

[GitHub] [hudi] vinothchandar commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
vinothchandar commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r890321822 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -142,6 +142,9 @@ object DataSourceReadOptions { .key("ho

[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] support partial update on mor table

2022-06-06 Thread GitBox
hudi-bot commented on PR #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1147855180 ## CI report: * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN * 59eacbed10467905643880e951b9f969a86747b9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1147845696 ## CI report: * 597733efd5d2c1b25b5e0b3f6a628752eff9e52c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9112

[jira] [Closed] (HUDI-4140) Fix default partition and hive style partitioning w/ bulk insert row writer with virtual keys

2022-06-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4140. - Resolution: Fixed > Fix default partition and hive style partitioning w/ bulk insert row w

[jira] [Assigned] (HUDI-4140) Fix default partition and hive style partitioning w/ bulk insert row writer with virtual keys

2022-06-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-4140: - Assignee: sivabalan narayanan > Fix default partition and hive style partitioning

[jira] [Closed] (HUDI-4197) Building out just FILES partition with Async indexer fails

2022-06-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4197. - Resolution: Fixed > Building out just FILES partition with Async indexer fails > -

[jira] [Closed] (HUDI-4171) NonPartitioned Key gen w/ virtual keys fails to be read w/ presto

2022-06-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4171. - Fix Version/s: 0.11.1 (was: 0.11.0) Resolution: Fixed > NonP

[jira] [Assigned] (HUDI-4171) NonPartitioned Key gen w/ virtual keys fails to be read w/ presto

2022-06-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-4171: - Assignee: sivabalan narayanan > NonPartitioned Key gen w/ virtual keys fails to b

[hudi] branch master updated: [HUDI-4171] Fixing Non partitioned with virtual keys in read path (#5747)

2022-06-06 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7da97c8096 [HUDI-4171] Fixing Non partitioned w

[GitHub] [hudi] nsivabalan merged pull request #5747: [HUDI-4171] Fixing Non partitioned with virtual keys in read path

2022-06-06 Thread GitBox
nsivabalan merged PR #5747: URL: https://github.com/apache/hudi/pull/5747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] nsivabalan commented on pull request #5747: [HUDI-4171] Fixing Non partitioned with virtual keys in read path

2022-06-06 Thread GitBox
nsivabalan commented on PR #5747: URL: https://github.com/apache/hudi/pull/5747#issuecomment-1147843345 Got an approval from Sagar verbally. going ahead w/ merging to get it into 0.11.1. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[hudi] branch master updated: [HUDI-4197] Fix Async indexer to support building FILES partition (#5766)

2022-06-06 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 21b903fddb [HUDI-4197] Fix Async indexer to sup

[GitHub] [hudi] nsivabalan merged pull request #5766: [HUDI-4197] Fix Async indexer to support building FILES partition

2022-06-06 Thread GitBox
nsivabalan merged PR #5766: URL: https://github.com/apache/hudi/pull/5766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] hudi-bot commented on pull request #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
hudi-bot commented on PR #5773: URL: https://github.com/apache/hudi/pull/5773#issuecomment-1147842194 ## CI report: * 597733efd5d2c1b25b5e0b3f6a628752eff9e52c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] nsivabalan opened a new pull request, #5773: [HUDI-4192] Fixing sorting of keys fetched from metadata table

2022-06-06 Thread GitBox
nsivabalan opened a new pull request, #5773: URL: https://github.com/apache/hudi/pull/5773 ## What is the purpose of the pull request - Key prefixes fetched from metadata table in col stats index is not sorted. and hence may result in entries being missed or unnecessary seeks to start

[GitHub] [hudi] hudi-bot commented on pull request #5766: [HUDI-4197] Fix Async indexer to support building FILES partition

2022-06-06 Thread GitBox
hudi-bot commented on PR #5766: URL: https://github.com/apache/hudi/pull/5766#issuecomment-1147832859 ## CI report: * 81e551b139a2fe8b9ced954d045dabfd6643142f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9107

[GitHub] [hudi] nsivabalan commented on pull request #5755: [HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw…

2022-06-06 Thread GitBox
nsivabalan commented on PR #5755: URL: https://github.com/apache/hudi/pull/5755#issuecomment-1147818277 Did you query metadata table directly? or did you try to query data table by enabling data skipping? -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5737: [HUDI-4178][Stacked on 5733] Addressing performance regressions in Spark DataSourceV2 Integration

2022-06-06 Thread GitBox
alexeykudinkin commented on code in PR #5737: URL: https://github.com/apache/hudi/pull/5737#discussion_r890475622 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala: ## @@ -122,29 +122,39 @@ abstract class HoodieBaseRelation(val sq

[GitHub] [hudi] hudi-bot commented on pull request #5747: [HUDI-4171] Fixing Non partitioned with virtual keys in read path

2022-06-06 Thread GitBox
hudi-bot commented on PR #5747: URL: https://github.com/apache/hudi/pull/5747#issuecomment-1147761879 ## CI report: * de1cd7d3aca3ad9b70dbf03790d270866e6de052 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9105

[GitHub] [hudi] marchpure commented on pull request #5755: [HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw…

2022-06-06 Thread GitBox
marchpure commented on PR #5755: URL: https://github.com/apache/hudi/pull/5755#issuecomment-1147756414 > I chased down the issue. we should sort the keys in higher layer before calling keyPrefixIterator. I don't think the fix in this patch makes sense. In most of the code path, we let calle

[GitHub] [hudi] nsivabalan commented on pull request #5755: [HUDI-4192] HoodieHFileReader scan top cells after bottom cells throw…

2022-06-06 Thread GitBox
nsivabalan commented on PR #5755: URL: https://github.com/apache/hudi/pull/5755#issuecomment-1147748099 I chased down the issue. we should sort the keys in higher layer before calling keyPrefixIterator. I don't think the fix in this patch makes sense. In most of the code path, we let caller

[GitHub] [hudi] bhasudha opened a new pull request, #5772: [DOCS] Add yahoo japan tech blog on data lake using Apache Hudi

2022-06-06 Thread GitBox
bhasudha opened a new pull request, #5772: URL: https://github.com/apache/hudi/pull/5772 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpos

  1   2   3   >