[GitHub] [hudi] hudi-bot commented on pull request #6365: [HUDI-4601] read error from MOR table after compaction

2022-08-10 Thread GitBox
hudi-bot commented on PR #6365: URL: https://github.com/apache/hudi/pull/6365#issuecomment-1211618953 ## CI report: * 86f98b82f17b041148a237e09ba59e378de83b81 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-10 Thread GitBox
hudi-bot commented on PR #6347: URL: https://github.com/apache/hudi/pull/6347#issuecomment-1211618894 ## CI report: * 386a9eb87a073a4c956fc5f5329701feeb012227 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1070

[GitHub] [hudi] hudi-bot commented on pull request #6141: [HUDI-3189] Fallback to full table scan with incremental query when files are cleaned up or achived for MOR table

2022-08-10 Thread GitBox
hudi-bot commented on PR #6141: URL: https://github.com/apache/hudi/pull/6141#issuecomment-1211618489 ## CI report: * 2de4df9e16f88a4813d404ba2111a9b4db19c03b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1066

[GitHub] [hudi] hudi-bot commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-10 Thread GitBox
hudi-bot commented on PR #6347: URL: https://github.com/apache/hudi/pull/6347#issuecomment-1211614653 ## CI report: * 386a9eb87a073a4c956fc5f5329701feeb012227 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1070

[GitHub] [hudi] hudi-bot commented on pull request #6228: [HUDI-4488] Improve S3EventsHoodieIncrSource efficiency

2022-08-10 Thread GitBox
hudi-bot commented on PR #6228: URL: https://github.com/apache/hudi/pull/6228#issuecomment-1211614463 ## CI report: * 0f7c3d5002dd2d5bd65bba2768766769ef0f5466 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1073

[GitHub] [hudi] wuwenchi commented on pull request #6365: [HUDI-4601] read error from MOR table after compaction

2022-08-10 Thread GitBox
wuwenchi commented on PR #6365: URL: https://github.com/apache/hudi/pull/6365#issuecomment-1211612875 @danny0405 can you help review? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Updated] (HUDI-4601) read error from MOR table after compaction

2022-08-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4601: - Labels: pull-request-available (was: ) > read error from MOR table after compaction > ---

[GitHub] [hudi] wuwenchi opened a new pull request, #6365: [HUDI-4601] read error from MOR table after compaction

2022-08-10 Thread GitBox
wuwenchi opened a new pull request, #6365: URL: https://github.com/apache/hudi/pull/6365 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[jira] [Created] (HUDI-4601) read error from MOR table after compaction

2022-08-10 Thread wuwenchi (Jira)
wuwenchi created HUDI-4601: -- Summary: read error from MOR table after compaction Key: HUDI-4601 URL: https://issues.apache.org/jira/browse/HUDI-4601 Project: Apache Hudi Issue Type: Bug Co

[GitHub] [hudi] honeyaya commented on pull request #6347: [HUDI-4582] Support batch synchronization of partition to hive metastore to avoid timeout with --sync-mode="hms" and use-jdbc=false

2022-08-10 Thread GitBox
honeyaya commented on PR #6347: URL: https://github.com/apache/hudi/pull/6347#issuecomment-1211605774 @hudi-bot run azure `@hudi-bot run azure` `run azure` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] minihippo commented on a diff in pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-08-10 Thread GitBox
minihippo commented on code in PR #5064: URL: https://github.com/apache/hudi/pull/5064#discussion_r943112733 ## hudi-metaserver/src/main/thrift/gen-java/org/apache/hudi/metaserver/thrift/AlreadyExistException.java: ## @@ -0,0 +1,377 @@ +/** + * Autogenerated by Thrift Compiler (

[GitHub] [hudi] hudi-bot commented on pull request #6363: [MINOR] fix potential NPE in spark writer

2022-08-10 Thread GitBox
hudi-bot commented on PR #6363: URL: https://github.com/apache/hudi/pull/6363#issuecomment-1211573163 ## CI report: * e0305b9c23bffa362f9e6d1e7e90264533e8a687 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1072

[GitHub] [hudi] minihippo commented on a diff in pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-08-10 Thread GitBox
minihippo commented on code in PR #5064: URL: https://github.com/apache/hudi/pull/5064#discussion_r943110690 ## hudi-metaserver/src/main/java/org/apache/hudi/metaserver/client/HoodieMetaServerClientImp.java: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] minihippo commented on a diff in pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-08-10 Thread GitBox
minihippo commented on code in PR #5064: URL: https://github.com/apache/hudi/pull/5064#discussion_r943109815 ## hudi-metaserver/src/main/java/org/apache/hudi/metaserver/client/RetryingHoodieMetaServerClient.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [hudi] minihippo commented on a diff in pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-08-10 Thread GitBox
minihippo commented on code in PR #5064: URL: https://github.com/apache/hudi/pull/5064#discussion_r943109351 ## hudi-metaserver/src/main/java/org/apache/hudi/metaserver/service/SnapshotService.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [hudi] minihippo commented on a diff in pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-08-10 Thread GitBox
minihippo commented on code in PR #5064: URL: https://github.com/apache/hudi/pull/5064#discussion_r943108897 ## hudi-metaserver/src/test/java/org/apache/hudi/metaserver/store/TestRelationDBBasedStore.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (

[jira] [Assigned] (HUDI-4579) [DOCS] Add docs on manually upgrading and downgrading table through CLI

2022-08-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-4579: --- Assignee: Ethan Guo (was: Sagar Sumit) > [DOCS] Add docs on manually upgrading and downgrading table

[GitHub] [hudi] yihua commented on issue #6335: [SUPPORT] Deltastreamer updates not supporting the addition of new columns

2022-08-10 Thread GitBox
yihua commented on issue #6335: URL: https://github.com/apache/hudi/issues/6335#issuecomment-1211565003 @rohit-m-99 Is this from Spark SQL? Do you see any exceptions or stacktrace? A few things to try out: (1) restart spark-shell or spark-sql to see if this goes away; (2) set `hoodie.metad

[jira] [Resolved] (HUDI-4600) Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2022-08-10 Thread HunterHunter (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HunterHunter resolved HUDI-4600. > Hive synchronization failure : Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHive

[GitHub] [hudi] vinothchandar merged pull request #6360: [DOCS] Add Presto Tech Talk June 2022 to talks page

2022-08-10 Thread GitBox
vinothchandar merged PR #6360: URL: https://github.com/apache/hudi/pull/6360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.ap

[hudi] branch asf-site updated: [DOCS] Add Presto Tech Talk June 2022 to talks page (#6360)

2022-08-10 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new dbdffd8528 [DOCS] Add Presto Tech Talk June 20

[GitHub] [hudi] LinMingQiang closed issue #6364: Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2022-08-10 Thread GitBox
LinMingQiang closed issue #6364: Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient URL: https://github.com/apache/hudi/issues/6364 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [hudi] hudi-bot commented on pull request #6228: [HUDI-4488] Improve S3EventsHoodieIncrSource efficiency

2022-08-10 Thread GitBox
hudi-bot commented on PR #6228: URL: https://github.com/apache/hudi/pull/6228#issuecomment-1211547379 ## CI report: * Unknown: [CANCELED](TBD) * 0f7c3d5002dd2d5bd65bba2768766769ef0f5466 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039

[GitHub] [hudi] hudi-bot commented on pull request #6228: [HUDI-4488] Improve S3EventsHoodieIncrSource efficiency

2022-08-10 Thread GitBox
hudi-bot commented on PR #6228: URL: https://github.com/apache/hudi/pull/6228#issuecomment-1211545378 ## CI report: * Unknown: [CANCELED](TBD) * 0f7c3d5002dd2d5bd65bba2768766769ef0f5466 UNKNOWN Bot commands @hudi-bot supports the following commands: - `

[GitHub] [hudi] vamshigv commented on pull request #6228: [HUDI-4488] Improve S3EventsHoodieIncrSource efficiency

2022-08-10 Thread GitBox
vamshigv commented on PR #6228: URL: https://github.com/apache/hudi/pull/6228#issuecomment-1211543968 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[hudi] branch asf-site updated: [HUDI-4576][DOCS] Fix schema evolution docs (#6334)

2022-08-10 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 669d79208f [HUDI-4576][DOCS] Fix schema evol

[GitHub] [hudi] xushiyan merged pull request #6334: [HUDI-4576][DOCS] Fix schema evolution docs

2022-08-10 Thread GitBox
xushiyan merged PR #6334: URL: https://github.com/apache/hudi/pull/6334 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.

[GitHub] [hudi] hudi-bot commented on pull request #6363: [MINOR] fix potential NPE in spark writer

2022-08-10 Thread GitBox
hudi-bot commented on PR #6363: URL: https://github.com/apache/hudi/pull/6363#issuecomment-1211524596 ## CI report: * e0305b9c23bffa362f9e6d1e7e90264533e8a687 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1072

[GitHub] [hudi] vinothchandar commented on a diff in pull request #6345: [HUDI-4552]: RFC-58: Integrate column stats index with all query engines

2022-08-10 Thread GitBox
vinothchandar commented on code in PR #6345: URL: https://github.com/apache/hudi/pull/6345#discussion_r943073339 ## rfc/rfc-58/rfc-58.md: ## @@ -0,0 +1,69 @@ + +# RFC-58: Integrate column stats index with all query engines + + + +## Proposers + +- @pratyakshsharma + +## Approver

[GitHub] [hudi] hudi-bot commented on pull request #6363: [MINOR] fix potential NPE in spark writer

2022-08-10 Thread GitBox
hudi-bot commented on PR #6363: URL: https://github.com/apache/hudi/pull/6363#issuecomment-1211521283 ## CI report: * e0305b9c23bffa362f9e6d1e7e90264533e8a687 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] LinMingQiang opened a new issue, #6364: Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2022-08-10 Thread GitBox
LinMingQiang opened a new issue, #6364: URL: https://github.com/apache/hudi/issues/6364 `10:32:28.039 [pool-9-thread-1] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOFatalInternalException: Unexpecte

[jira] [Created] (HUDI-4600) Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2022-08-10 Thread HunterHunter (Jira)
HunterHunter created HUDI-4600: -- Summary: Hive synchronization failure : Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient Key: HUDI-4600 URL: https://issues.apache.org/jira/browse/H

[GitHub] [hudi] microbearz opened a new pull request, #6363: [MINOR] fix potential NPE in spark writer

2022-08-10 Thread GitBox
microbearz opened a new pull request, #6363: URL: https://github.com/apache/hudi/pull/6363 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performan

[GitHub] [hudi] KnightChess commented on pull request #6362: [MINOR][DOCS] add tip to schema evolution

2022-08-10 Thread GitBox
KnightChess commented on PR #6362: URL: https://github.com/apache/hudi/pull/6362#issuecomment-1211512801 https://user-images.githubusercontent.com/20125927/184059475-63dff155-b99e-4cfc-a8b4-7db38e6abe2d.png";> -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [hudi] KnightChess opened a new pull request, #6362: [MINOR][DOCS] add tip to schema evolution

2022-08-10 Thread GitBox
KnightChess opened a new pull request, #6362: URL: https://github.com/apache/hudi/pull/6362 Replenish: #6344 ### Change Logs ### Impact none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-cont

[GitHub] [hudi] fengjian428 commented on issue #6069: [SUPPORT] /hoodie/temp Folder and contents not getting deleted

2022-08-10 Thread GitBox
fengjian428 commented on issue #6069: URL: https://github.com/apache/hudi/issues/6069#issuecomment-1211510782 > yeah, it should be safe to remove the marker files if no relevant inflight instant in the timeline -- This is an automated message from the Apache Git Service. To respon

[GitHub] [hudi] wzx140 commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-10 Thread GitBox
wzx140 commented on code in PR #5629: URL: https://github.com/apache/hudi/pull/5629#discussion_r943064220 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/table/log/HoodieFileSliceReader.java: ## @@ -20,62 +20,33 @@ package org.apache.hudi.common.table.log

[GitHub] [hudi] YannByron commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

2022-08-10 Thread GitBox
YannByron commented on code in PR #6264: URL: https://github.com/apache/hudi/pull/6264#discussion_r943051969 ## hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala: ## @@ -52,8 +57,56 @@ abstract class HoodieSpark3Cata

[GitHub] [hudi] gudladona commented on pull request #6179: [HUDI-4448] Remove the latest commit refresh for timeline server

2022-08-10 Thread GitBox
gudladona commented on PR #6179: URL: https://github.com/apache/hudi/pull/6179#issuecomment-1211491636 > > @danny0405 Few questions. > > > > * what table services could cause this? > > * Does this impact inserts too or only upserts? > > * The fix in this PR https://github.com/ap

[GitHub] [hudi] YannByron commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

2022-08-10 Thread GitBox
YannByron commented on code in PR #6264: URL: https://github.com/apache/hudi/pull/6264#discussion_r943051969 ## hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala: ## @@ -52,8 +57,56 @@ abstract class HoodieSpark3Cata

[GitHub] [hudi] YannByron commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

2022-08-10 Thread GitBox
YannByron commented on code in PR #6264: URL: https://github.com/apache/hudi/pull/6264#discussion_r943050236 ## hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala: ## @@ -52,8 +57,56 @@ abstract class HoodieSpark3Cata

[GitHub] [hudi] eric9204 commented on issue #6308: [SUPPORT] Spark multi writer failed ! ! !

2022-08-10 Thread GitBox
eric9204 commented on issue #6308: URL: https://github.com/apache/hudi/issues/6308#issuecomment-1211486157 > @eric9204 For the first failure, can you share the full write configs? Was there any pending/inflight compaction and clustering? Can you share the `.hoodie` directory under the base

[GitHub] [hudi] novisfff closed issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?

2022-08-10 Thread GitBox
novisfff closed issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data? URL: https://github.com/apache/hudi/issues/6095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [hudi] novisfff commented on issue #6095: [SUPPORT] May TimelineServerBasedWriteMarkers lost data?

2022-08-10 Thread GitBox
novisfff commented on issue #6095: URL: https://github.com/apache/hudi/issues/6095#issuecomment-1211483503 > Nope. It will never lose data. If process is crashed mid-way, the commit has also failed mid-way. So, next time when you restart your pipeline, the rollback of that partially failed

[jira] [Updated] (HUDI-3777) Optimize column stats storage

2022-08-10 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3777: - Sprint: 2022/08/22 > Optimize column stats storage > - > > Key

[GitHub] [hudi] flashJd commented on pull request #6291: [MINOR] operator priority problem

2022-08-10 Thread GitBox
flashJd commented on PR #6291: URL: https://github.com/apache/hudi/pull/6291#issuecomment-1211468174 > > > obivously, operator priority problem > > > > > > @danny0405 can you help review it > > (numLogFilesSeen - 1) is the number of logFiles we have processed, logFilePaths.

[GitHub] [hudi] flashJd commented on pull request #6324: [HUDI-4561] Improve incremental query using the fileSlice adjacent to read.end-commit

2022-08-10 Thread GitBox
flashJd commented on PR #6324: URL: https://github.com/apache/hudi/pull/6324#issuecomment-1211467410 @danny0405 looking forward to your reply -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [hudi] danny0405 commented on pull request #6179: [HUDI-4448] Remove the latest commit refresh for timeline server

2022-08-10 Thread GitBox
danny0405 commented on PR #6179: URL: https://github.com/apache/hudi/pull/6179#issuecomment-1211459665 > @danny0405 Few questions. > > * what table services could cause this? > * Does this impact inserts too or only upserts? > * The fix in this PR https://github.com/apache/hudi/

[GitHub] [hudi] kk17 commented on issue #5861: [SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11

2022-08-10 Thread GitBox
kk17 commented on issue #5861: URL: https://github.com/apache/hudi/issues/5861#issuecomment-1211458228 The pull request is waiting for merge. I need some time and guide to write a proper unit test. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [hudi] desaismi commented on issue #6069: [SUPPORT] /hoodie/temp Folder and contents not getting deleted

2022-08-10 Thread GitBox
desaismi commented on issue #6069: URL: https://github.com/apache/hudi/issues/6069#issuecomment-1211442775 Hello, @fengjian428 we have some dependencies on our data pipeline that makes upgrading to the latest version non-trivial. Is this a known issue for 0.8.0? If it's a rare interm

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5632: [HUDI-4122] Fix NPE caused by adding kafka nodes

2022-08-10 Thread GitBox
alexeykudinkin commented on code in PR #5632: URL: https://github.com/apache/hudi/pull/5632#discussion_r943021599 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java: ## @@ -287,6 +292,32 @@ public OffsetRange[] getNextOffsetRanges(Optio

[GitHub] [hudi] alexeykudinkin opened a new pull request, #6361: [WIP] Cleaning up Hudi custom Spark `Rule`s

2022-08-10 Thread GitBox
alexeykudinkin opened a new pull request, #6361: URL: https://github.com/apache/hudi/pull/6361 ### Change Logs TBA ### Impact None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)

[GitHub] [hudi] yihua commented on a diff in pull request #6359: [HUDI-4564] Update docs for Spark 3.3 support

2022-08-10 Thread GitBox
yihua commented on code in PR #6359: URL: https://github.com/apache/hudi/pull/6359#discussion_r942998774 ## website/docs/quick-start-guide.md: ## @@ -20,6 +20,7 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions [ | Hudi| Supported

[GitHub] [hudi] CTTY commented on a diff in pull request #6359: [HUDI-4564] Update docs for Spark 3.3 support

2022-08-10 Thread GitBox
CTTY commented on code in PR #6359: URL: https://github.com/apache/hudi/pull/6359#discussion_r942996034 ## website/docs/quick-start-guide.md: ## @@ -20,6 +20,7 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions [ | Hudi| Supported S

[GitHub] [hudi] CTTY commented on a diff in pull request #6359: [HUDI-4564] Update docs for Spark 3.3 support

2022-08-10 Thread GitBox
CTTY commented on code in PR #6359: URL: https://github.com/apache/hudi/pull/6359#discussion_r942996034 ## website/docs/quick-start-guide.md: ## @@ -20,6 +20,7 @@ Hudi works with Spark-2.4.3+ & Spark 3.x versions. You can follow instructions [ | Hudi| Supported S

[GitHub] [hudi] yihua opened a new pull request, #6360: [DOCS] Add Presto Tech Talk June 2022 to talks page

2022-08-10 Thread GitBox
yihua opened a new pull request, #6360: URL: https://github.com/apache/hudi/pull/6360 ### Change Logs As above. ### Impact Only docs update. **Risk level: none** The website can be properly built and visualized locally. ### Contributor's checklist

[jira] [Updated] (HUDI-4564) Docs writing for 0.12.0: spark 3.3 support

2022-08-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4564: - Labels: pull-request-available (was: ) > Docs writing for 0.12.0: spark 3.3 support > ---

[GitHub] [hudi] yihua opened a new pull request, #6359: [HUDI-4564] Update docs for Spark 3.3 support

2022-08-10 Thread GitBox
yihua opened a new pull request, #6359: URL: https://github.com/apache/hudi/pull/6359 ### Change Logs This PR updates the Spark Guide website page with Spark 3.3 support. ### Impact Only docs update. **Risk level: none** The website can be properly built and vi

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-08-10 Thread GitBox
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1211379741 ## CI report: * 17ca1f72569520fecd9eeb509b9925da9134f898 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1072

[GitHub] [hudi] nsivabalan commented on issue #5291: [SUPPORT] How to use hudi-defaults.conf with Glue

2022-08-10 Thread GitBox
nsivabalan commented on issue #5291: URL: https://github.com/apache/hudi/issues/5291#issuecomment-1211359128 Can we close this one out, or is there any pending work item? I see the linked PR is already landed. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] nsivabalan commented on issue #5861: [SUPPORT] Hudi spark datasource error after migrate from 0.8 to 0.11

2022-08-10 Thread GitBox
nsivabalan commented on issue #5861: URL: https://github.com/apache/hudi/issues/5861#issuecomment-1211356850 Is there any follow up here. or is something waiting for any assistance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [hudi] nsivabalan commented on issue #5938: Why Hudi publish data size much more than the input file size when publish to hive

2022-08-10 Thread GitBox
nsivabalan commented on issue #5938: URL: https://github.com/apache/hudi/issues/5938#issuecomment-1211356103 @developerwxl : any update on this. if the issue is resolved, feel free to close out the issue. -- This is an automated message from the Apache Git Service. To respond to the mess

[jira] [Closed] (HUDI-4589) "show fsview all" hudi-cli fails for a hudi table written via flink

2022-08-10 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-4589. - Resolution: Duplicate > "show fsview all" hudi-cli fails for a hudi table written via flin

[jira] [Commented] (HUDI-4589) "show fsview all" hudi-cli fails for a hudi table written via flink

2022-08-10 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578165#comment-17578165 ] sivabalan narayanan commented on HUDI-4589: --- cool, got it. lets use HUDI-4485 to

[jira] [Updated] (HUDI-4599) Add single-writer validation if concurrency control is not enabled

2022-08-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4599: Description:  In case of misconfiguration or unintended deployment model, where multiple writers write to th

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-08-10 Thread GitBox
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1211316104 ## CI report: * 17ca1f72569520fecd9eeb509b9925da9134f898 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1072

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-08-10 Thread GitBox
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1211311523 ## CI report: * 17ca1f72569520fecd9eeb509b9925da9134f898 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-4599) Add single-writer validation if concurrency control is not enabled

2022-08-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4599: Summary: Add single-writer validation if concurrency control is not enabled (was: Add validation in writer

[jira] [Updated] (HUDI-4599) Add single-writer validation if concurrency control is not enabled

2022-08-10 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4599: Fix Version/s: 0.13.0 > Add single-writer validation if concurrency control is not enabled > ---

[jira] [Created] (HUDI-4599) Add validation in writer if concurrency control is not enabled

2022-08-10 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-4599: --- Summary: Add validation in writer if concurrency control is not enabled Key: HUDI-4599 URL: https://issues.apache.org/jira/browse/HUDI-4599 Project: Apache Hudi Issue

[jira] [Updated] (HUDI-4598) GCP Support

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4598: -- Priority: Critical (was: Major) > GCP Support > --- > > Key: HUDI-4598

[jira] [Updated] (HUDI-4597) [GCP] 0 byte files appearing on GCS

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4597: -- Epic Link: HUDI-4598 > [GCP] 0 byte files appearing on GCS > ---

[jira] [Updated] (HUDI-4598) GCP Support

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4598: -- Description: Epic for all tasks related to Hudi's GCP support > GCP Support > --- > >

[jira] [Updated] (HUDI-4598) GCP Support

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4598: -- Component/s: reader-core writer-core > GCP Support > --- > >

[jira] [Updated] (HUDI-4564) Docs writing for 0.12.0: spark 3.3 support

2022-08-10 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4564: - Summary: Docs writing for 0.12.0: spark 3.3 support (was: Docs writing for 0.12.0: spark 3.3 support and

[jira] [Created] (HUDI-4598) GCP Support

2022-08-10 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-4598: - Summary: GCP Support Key: HUDI-4598 URL: https://issues.apache.org/jira/browse/HUDI-4598 Project: Apache Hudi Issue Type: Epic Reporter: Alexey

[jira] [Updated] (HUDI-4597) [GCP] 0 byte files appearing on GCS

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4597: -- Description: During recent troubleshooting session w/ Walmart folks we've identified an issue w

[jira] [Created] (HUDI-4597) [GCP] 0 byte files appearing on GCS

2022-08-10 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-4597: - Summary: [GCP] 0 byte files appearing on GCS Key: HUDI-4597 URL: https://issues.apache.org/jira/browse/HUDI-4597 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] nochimow commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

2022-08-10 Thread GitBox
nochimow commented on issue #4622: URL: https://github.com/apache/hudi/issues/4622#issuecomment-1211283255 @pomaster Yes i did, Their reply was that the product team is aware of this issue, but there is no ETA to fix this yet. -- This is an automated message from the Apache Git Service. T

[GitHub] [hudi] pomaster commented on issue #4622: [SUPPORT] Can't query Redshift rows even after downgrade from 0.10

2022-08-10 Thread GitBox
pomaster commented on issue #4622: URL: https://github.com/apache/hudi/issues/4622#issuecomment-1211279428 @nochimow Thought you had opened an AWS Support ticket. If yes, any update from AWS Support? Curious to know if Amazon Redshift Spectrum team is planning to do something on this issue

[jira] [Updated] (HUDI-4588) Ingestion failing if source column is dropped

2022-08-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4588: - Labels: pull-request-available schema schema-evolution (was: schema schema-evolution) > Ingestio

[GitHub] [hudi] alexeykudinkin opened a new pull request, #6358: [HUDI-4588] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-08-10 Thread GitBox
alexeykudinkin opened a new pull request, #6358: URL: https://github.com/apache/hudi/pull/6358 ### Change Logs Currently, `HoodieParquetReader` is not specifying projected schema properly when reading Parquet files which ends up failing in cases when the provided schema is not equal

[jira] [Updated] (HUDI-4588) Ingestion failing if source column is dropped

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4588: -- Fix Version/s: 0.13.0 > Ingestion failing if source column is dropped >

[jira] [Assigned] (HUDI-4588) Ingestion failing if source column is dropped

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-4588: - Assignee: Alexey Kudinkin > Ingestion failing if source column is dropped > -

[jira] [Updated] (HUDI-4588) Ingestion failing if source column is dropped

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4588: -- Priority: Blocker (was: Major) > Ingestion failing if source column is dropped > --

[jira] [Closed] (HUDI-4446) Call out breaking change in KeyGenerator interface

2022-08-10 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-4446. - Resolution: Fixed > Call out breaking change in KeyGenerator interface > -

[GitHub] [hudi] gudladona commented on pull request #6179: [HUDI-4448] Remove the latest commit refresh for timeline server

2022-08-10 Thread GitBox
gudladona commented on PR #6179: URL: https://github.com/apache/hudi/pull/6179#issuecomment-1211210497 @danny0405 Few questions. - what table services could cause this? - Does this impact inserts too or only upserts? - The fix in this PR https://github.com/apache/hudi/pull/4

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6264: [HUDI-4503] support for parsing identifier with catalog

2022-08-10 Thread GitBox
alexeykudinkin commented on code in PR #6264: URL: https://github.com/apache/hudi/pull/6264#discussion_r942012496 ## hudi-spark-datasource/hudi-spark3-common/src/main/scala/org/apache/spark/sql/HoodieSpark3CatalystPlanUtils.scala: ## @@ -52,8 +57,56 @@ abstract class HoodieSpark

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #5629: [HUDI-3384][HUDI-3385] Spark specific file reader/writer.

2022-08-10 Thread GitBox
alexeykudinkin commented on code in PR #5629: URL: https://github.com/apache/hudi/pull/5629#discussion_r942820533 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/table/log/HoodieFileSliceReader.java: ## @@ -20,62 +20,33 @@ package org.apache.hudi.common.t

[GitHub] [hudi] crutis commented on issue #6281: [SUPPORT] AwsGlueCatalogSyncTool -The number of partition keys do not match the number of partition values

2022-08-10 Thread GitBox
crutis commented on issue #6281: URL: https://github.com/apache/hudi/issues/6281#issuecomment-1210963464 No support ticket with AWS yet, I'll check this out and let you know what I see, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-08-10 Thread GitBox
alexeykudinkin commented on code in PR #6170: URL: https://github.com/apache/hudi/pull/6170#discussion_r942643055 ## .github/workflows/bot.yml: ## @@ -9,6 +9,8 @@ on: branches: - master - 'release-*' +env: + MVN_ARGS: -ntp -B -V -Pwarn-log -Dorg.slf4j.simple

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-08-10 Thread GitBox
alexeykudinkin commented on code in PR #6170: URL: https://github.com/apache/hudi/pull/6170#discussion_r942635319 ## hudi-client/hudi-client-common/pom.xml: ## @@ -193,6 +193,12 @@ + + org.apache.hudi + hudi-tests-common + ${project.version} +

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-08-10 Thread GitBox
alexeykudinkin commented on code in PR #6170: URL: https://github.com/apache/hudi/pull/6170#discussion_r942634341 ## hudi-examples/hudi-examples-spark/pom.xml: ## @@ -230,6 +230,27 @@ + + +org.apache.logging.l

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6170: [HUDI-4441] Log4j2 configuration fixes and removal of log4j1 dependencies

2022-08-10 Thread GitBox
alexeykudinkin commented on code in PR #6170: URL: https://github.com/apache/hudi/pull/6170#discussion_r942631573 ## hudi-spark-datasource/hudi-spark/pom.xml: ## @@ -267,6 +252,14 @@ org.apache.logging.log4j log4j-1.2-api + Review Comment: As discusse

[GitHub] [hudi] xushiyan commented on issue #6167: [SUPPORT] No results are returned from incremental queries within the archived range

2022-08-10 Thread GitBox
xushiyan commented on issue #6167: URL: https://github.com/apache/hudi/issues/6167#issuecomment-1210784160 > In this case, why not merge archived instants before return? @1032851561 i don't think it's expected to return incremental results for archived commits. A design consideration

[jira] [Created] (HUDI-4596) Add totalRecordsDeleted in hudi metrics

2022-08-10 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-4596: Summary: Add totalRecordsDeleted in hudi metrics Key: HUDI-4596 URL: https://issues.apache.org/jira/browse/HUDI-4596 Project: Apache Hudi Issue Type: New Feature

[jira] [Updated] (HUDI-4596) Add totalRecordsDeleted in hudi metrics

2022-08-10 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4596: - Priority: Minor (was: Major) > Add totalRecordsDeleted in hudi metrics >

[GitHub] [hudi] xushiyan commented on issue #6148: [SUPPORT] Can't get numDeletes in HoodieMetrics

2022-08-10 Thread GitBox
xushiyan commented on issue #6148: URL: https://github.com/apache/hudi/issues/6148#issuecomment-1210771966 > Question: Is this a bug? If so, which version is expected to be repaired? If not, is there any plan to add it to metrics in a later version? Or is there any other way to get

[GitHub] [hudi] xushiyan closed issue #6148: [SUPPORT] Can't get numDeletes in HoodieMetrics

2022-08-10 Thread GitBox
xushiyan closed issue #6148: [SUPPORT] Can't get numDeletes in HoodieMetrics URL: https://github.com/apache/hudi/issues/6148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] xushiyan commented on issue #6281: [SUPPORT] AwsGlueCatalogSyncTool -The number of partition keys do not match the number of partition values

2022-08-10 Thread GitBox
xushiyan commented on issue #6281: URL: https://github.com/apache/hudi/issues/6281#issuecomment-1210751958 @crutis you can actually troubleshoot this by writing a program with aws sdk to mimic `org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient#addPartitionsToTable`. The list of partition v

  1   2   >