Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103914108 ## CI report: * bd02c1777af885032b1826525be77d36ee530c18 Azure:

Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2103913932 ## CI report: * 0f3da94a7e1f0c4a9366ce51f8cad6bdf582e5b9 Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103907870 ## CI report: * 24a58cfb51313e06350ef72f1de50702f5a8e6b3 Azure:

Re: [PR] [HUDI-7704] Unify test client storage classes with duplicate code [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11152: URL: https://github.com/apache/hudi/pull/11152#issuecomment-2103907682 ## CI report: * 0f3da94a7e1f0c4a9366ce51f8cad6bdf582e5b9 Azure:

[jira] [Closed] (HUDI-7725) Restructure HFileBootstrapIndex to separate Hadoop-dependent logic

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7725. --- Resolution: Fixed > Restructure HFileBootstrapIndex to separate Hadoop-dependent logic >

[jira] [Closed] (HUDI-7729) Move ParquetUtils to hudi-hadoop-common

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7729. --- Resolution: Fixed > Move ParquetUtils to hudi-hadoop-common > --- > >

(hudi) branch master updated: [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module (#11186)

2024-05-09 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8f5f5470b61 [HUDI-7729] Move ParquetUtils to

Re: [PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
yihua merged PR #11186: URL: https://github.com/apache/hudi/pull/11186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
yihua commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103880293 Azure CI is green. https://github.com/apache/hudi/assets/2497195/82bd9865-f5be-4ad6-bea3-f68e4c965f90;> -- This is an automated message from the Apache Git Service. To respond to

(hudi) branch master updated: [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic (#11171)

2024-05-09 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b1496068b3a [HUDI-7725] Restructure

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
yihua merged PR #11171: URL: https://github.com/apache/hudi/pull/11171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103844265 ## CI report: * 24a58cfb51313e06350ef72f1de50702f5a8e6b3 Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103844217 ## CI report: * 42da7a112173058531053c20f1d1bd043401d03c UNKNOWN * f5243e9ef80eee6fc179af27bdb4e56c336aa3cc Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103806351 ## CI report: * ffe9706831b4b92c270c888a6b8b3a6744f09c08 Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103806291 ## CI report: * 7b819bdb3ecaa9d87a79363f8ae7b05e52190f0c Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103801175 ## CI report: * ffe9706831b4b92c270c888a6b8b3a6744f09c08 Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103801089 ## CI report: * 7b819bdb3ecaa9d87a79363f8ae7b05e52190f0c Azure:

Re: [PR] [HUDI-7738] FileStreamReader need set Charset with UTF-8 [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11181: URL: https://github.com/apache/hudi/pull/11181#issuecomment-2103795394 ## CI report: * ffbdffa8d2f54fd6fd4261aed25d04d9c383eccf UNKNOWN * 438f2e1099315408332dce885fe511ec3ae6c9b6 Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103795301 ## CI report: * 7b819bdb3ecaa9d87a79363f8ae7b05e52190f0c Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
jonvex commented on code in PR #11185: URL: https://github.com/apache/hudi/pull/11185#discussion_r1596206568 ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -300,21 +273,6 @@ private Option getTableParquetSchemaFromDataFile() { }

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11171: URL: https://github.com/apache/hudi/pull/11171#discussion_r1596201861 ## hudi-hadoop-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java: ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11171: URL: https://github.com/apache/hudi/pull/11171#discussion_r1596200567 ## hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java: ## @@ -24,7 +24,7 @@ import org.apache.hudi.cli.commands.SparkMain.SparkCommand; import

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
jonvex commented on code in PR #11185: URL: https://github.com/apache/hudi/pull/11185#discussion_r1596195397 ## hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/HFileUtils.java: ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[jira] [Created] (HUDI-7741) Implement methods in HFileUtils extends BaseFileUtils

2024-05-09 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7741: - Summary: Implement methods in HFileUtils extends BaseFileUtils Key: HUDI-7741 URL: https://issues.apache.org/jira/browse/HUDI-7741 Project: Apache Hudi

Re: [PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11186: URL: https://github.com/apache/hudi/pull/11186#issuecomment-2103762379 ## CI report: * decf7f95d1ca24a39391b17194a8fe5a255f4e07 Azure:

Re: [PR] [HUDI-7738] FileStreamReader need set Charset with UTF-8 [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11181: URL: https://github.com/apache/hudi/pull/11181#issuecomment-2103762316 ## CI report: * ed8e2dd8ecd2a233d5241424b92f36e4ae21f85b Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103762269 ## CI report: * 7b819bdb3ecaa9d87a79363f8ae7b05e52190f0c Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103756429 ## CI report: * 7b819bdb3ecaa9d87a79363f8ae7b05e52190f0c Azure:

Re: [PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11186: URL: https://github.com/apache/hudi/pull/11186#issuecomment-2103756503 ## CI report: * decf7f95d1ca24a39391b17194a8fe5a255f4e07 Azure:

Re: [PR] [HUDI-7738] FileStreamReader need set Charset with UTF-8 [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11181: URL: https://github.com/apache/hudi/pull/11181#issuecomment-2103756465 ## CI report: * ed8e2dd8ecd2a233d5241424b92f36e4ae21f85b Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
jonvex commented on code in PR #11171: URL: https://github.com/apache/hudi/pull/11171#discussion_r1596186554 ## hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java: ## @@ -200,582 +166,4 @@ public void dropIndex() { public boolean

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
jonvex commented on code in PR #11171: URL: https://github.com/apache/hudi/pull/11171#discussion_r1596185798 ## hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/HoodieHBaseAvroHFileReader.java: ## @@ -95,16 +96,29 @@ public

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
jonvex commented on code in PR #11171: URL: https://github.com/apache/hudi/pull/11171#discussion_r1596185700 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroFileReaderFactory.java: ## @@ -45,11 +44,16 @@ protected HoodieFileReader

Re: [PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11186: URL: https://github.com/apache/hudi/pull/11186#issuecomment-2103751118 ## CI report: * decf7f95d1ca24a39391b17194a8fe5a255f4e07 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103751093 ## CI report: * ffe9706831b4b92c270c888a6b8b3a6744f09c08 Azure:

Re: [PR] [HUDI-7738] FileStreamReader need set Charset with UTF-8 [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11181: URL: https://github.com/apache/hudi/pull/11181#issuecomment-2103751051 ## CI report: * ed8e2dd8ecd2a233d5241424b92f36e4ae21f85b Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103751006 ## CI report: * 7b819bdb3ecaa9d87a79363f8ae7b05e52190f0c Azure:

Re: [PR] [HUDI-7739] Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy [hudi]

2024-05-09 Thread via GitHub
Zouxxyy commented on code in PR #11182: URL: https://github.com/apache/hudi/pull/11182#discussion_r1596179557 ## hudi-common/src/main/java/org/apache/hudi/common/conflict/detection/TimelineServerBasedDetectionStrategy.java: ## @@ -60,4 +60,6 @@ public abstract void

Re: [I] [SUPPORT]xxx.parquet is not a Parquet file [hudi]

2024-05-09 Thread via GitHub
MrAladdin commented on issue #11178: URL: https://github.com/apache/hudi/issues/11178#issuecomment-2103729103 > [@MrAladdin](https://github.com/MrAladdin) Can you please share the timeline and writer configurations. df .writeStream .format("hudi")

Re: [PR] [HUDI-7739] Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy [hudi]

2024-05-09 Thread via GitHub
Zouxxyy commented on code in PR #11182: URL: https://github.com/apache/hudi/pull/11182#discussion_r1596175336 ## hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java: ## @@ -202,6 +204,9 @@ public void stop() { if (markerHandler != null)

Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-09 Thread via GitHub
weitianpei commented on issue #11090: URL: https://github.com/apache/hudi/issues/11090#issuecomment-2103721112 my downstream flink did not encounter file not exception while the upstream increase the clean.commits parameter. I checked if my downstream program was re-reading old

[jira] [Assigned] (HUDI-7731) Fix usage of new Configuration() in production code

2024-05-09 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler reassigned HUDI-7731: - Assignee: Jonathan Vexler > Fix usage of new Configuration() in production code >

[jira] [Updated] (HUDI-7731) Fix usage of new Configuration() in production code

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7731: Sprint: Sprint 2023-04-26 > Fix usage of new Configuration() in production code >

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103702442 ## CI report: * 5fdc71489d8830b2a4966e8e3892cac080738c31 Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103702398 ## CI report: * 692b8b06bbb7c194bdd107caba9858c07e0d0a1d Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103696459 ## CI report: * 5fdc71489d8830b2a4966e8e3892cac080738c31 Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103696369 ## CI report: * 692b8b06bbb7c194bdd107caba9858c07e0d0a1d Azure:

Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #10922: URL: https://github.com/apache/hudi/pull/10922#issuecomment-2103695953 ## CI report: * 41e7049a782561d5f8f9a21af7ba4c1021b3fb14 Azure:

[jira] [Updated] (HUDI-7585) Avoid reading log files for resolving schema for _hoodie_operation field

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7585: Status: In Progress (was: Open) > Avoid reading log files for resolving schema for _hoodie_operation field

Re: [PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11186: URL: https://github.com/apache/hudi/pull/11186#issuecomment-2103691109 ## CI report: * decf7f95d1ca24a39391b17194a8fe5a255f4e07 Azure:

Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on code in PR #11077: URL: https://github.com/apache/hudi/pull/11077#discussion_r1596140967 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMetadataMergedLogRecordScanner.java: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on code in PR #11077: URL: https://github.com/apache/hudi/pull/11077#discussion_r1596140626 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMetadataMergedLogRecordScanner.java: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on code in PR #11077: URL: https://github.com/apache/hudi/pull/11077#discussion_r1596140891 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMetadataMergedLogRecordScanner.java: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on code in PR #11077: URL: https://github.com/apache/hudi/pull/11077#discussion_r1596140101 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -271,6 +266,16 @@ public void processNextRecord(HoodieRecord

Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on code in PR #11077: URL: https://github.com/apache/hudi/pull/11077#discussion_r1596139985 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -91,7 +91,7 @@ public class HoodieMergedLogRecordScanner

[jira] [Closed] (HUDI-7350) Introduce HoodieIOFactory to abstract the reader and writer implementation

2024-05-09 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler closed HUDI-7350. - Resolution: Fixed > Introduce HoodieIOFactory to abstract the reader and writer implementation >

[jira] [Updated] (HUDI-7726) Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7726: Status: Patch Available (was: In Progress) > Restructure TableSchemaResolver to separate Hadoop logic and

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11185: URL: https://github.com/apache/hudi/pull/11185#discussion_r1596136553 ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -300,21 +273,6 @@ private Option getTableParquetSchemaFromDataFile() { }

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11185: URL: https://github.com/apache/hudi/pull/11185#discussion_r1596135765 ## hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java: ## @@ -300,21 +273,6 @@ private Option getTableParquetSchemaFromDataFile() { }

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11185: URL: https://github.com/apache/hudi/pull/11185#discussion_r1596133325 ## hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/HFileUtils.java: ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11185: URL: https://github.com/apache/hudi/pull/11185#discussion_r1596132381 ## hudi-hadoop-common/src/main/java/org/apache/hudi/common/util/HFileUtils.java: ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11185: URL: https://github.com/apache/hudi/pull/11185#discussion_r1596130675 ## hudi-hadoop-common/src/main/java/org/apache/hudi/common/table/HadoopTableSchemaResolver.java: ## @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation

(hudi) branch master updated: [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent (#11163)

2024-05-09 Thread jonvex
This is an automated email from the ASF dual-hosted git repository. jonvex pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e42217d2368 [HUDI-7350] Make Hudi reader and

Re: [PR] [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent [hudi]

2024-05-09 Thread via GitHub
jonvex merged PR #11163: URL: https://github.com/apache/hudi/pull/11163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent [hudi]

2024-05-09 Thread via GitHub
jonvex commented on PR #11163: URL: https://github.com/apache/hudi/pull/11163#issuecomment-2103656239 passed ci: https://github.com/apache/hudi/assets/26940621/bce86e31-19a1-4cb5-9b0e-0c061d32865a;> -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11186: URL: https://github.com/apache/hudi/pull/11186#issuecomment-2103653552 ## CI report: * decf7f95d1ca24a39391b17194a8fe5a255f4e07 Azure:

Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #10922: URL: https://github.com/apache/hudi/pull/10922#issuecomment-2103653230 ## CI report: * 7619816b75c4a7a9fce6a6b547736445d9b00f60 Azure:

Re: [PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11186: URL: https://github.com/apache/hudi/pull/11186#issuecomment-2103648041 ## CI report: * decf7f95d1ca24a39391b17194a8fe5a255f4e07 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7549] Reverting spurious log block deduction with LogRecordReader [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #10922: URL: https://github.com/apache/hudi/pull/10922#issuecomment-2103647669 ## CI report: * 7619816b75c4a7a9fce6a6b547736445d9b00f60 Azure:

Re: [I] [SUPPORT]Flink Streaming Read hudi table which is in clustering,encounterd file not exists. [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on issue #11090: URL: https://github.com/apache/hudi/issues/11090#issuecomment-2103645482 This is a replace commit, you can choose to skip it with option `read.skip_clustering` or `read.skip_insertoverride` enabled. -- This is an automated message from the Apache Git

Re: [PR] [HUDI-7739] Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on code in PR #11182: URL: https://github.com/apache/hudi/pull/11182#discussion_r1596123598 ## hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java: ## @@ -202,6 +204,9 @@ public void stop() { if (markerHandler !=

Re: [PR] [HUDI-7739] Shudown asyncDetectorExecutor in AsyncTimelineServerBasedDetectionStrategy [hudi]

2024-05-09 Thread via GitHub
danny0405 commented on code in PR #11182: URL: https://github.com/apache/hudi/pull/11182#discussion_r1596123340 ## hudi-common/src/main/java/org/apache/hudi/common/conflict/detection/TimelineServerBasedDetectionStrategy.java: ## @@ -60,4 +60,6 @@ public abstract void

[jira] [Closed] (HUDI-7728) Use StorageConfiguration in LockProvider constructors

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7728. --- Resolution: Fixed > Use StorageConfiguration in LockProvider constructors >

[jira] [Closed] (HUDI-7727) Avoid constructAbsolutePathInHadoopPath in hudi-common module

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7727. --- Resolution: Fixed > Avoid constructAbsolutePathInHadoopPath in hudi-common module >

[jira] [Updated] (HUDI-7729) Move ParquetUtils to hudi-hadoop-common

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7729: Story Points: 2 (was: 1) > Move ParquetUtils to hudi-hadoop-common >

[jira] [Updated] (HUDI-7729) Move ParquetUtils to hudi-hadoop-common

2024-05-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7729: Status: Patch Available (was: In Progress) > Move ParquetUtils to hudi-hadoop-common >

[jira] [Updated] (HUDI-7729) Move ParquetUtils to hudi-hadoop-common

2024-05-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7729: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Move ParquetUtils to

[PR] [HUDI-7729] Move ParquetUtils to hudi-hadoop-common module [hudi]

2024-05-09 Thread via GitHub
yihua opened a new pull request, #11186: URL: https://github.com/apache/hudi/pull/11186 ### Change Logs This PR moves the `ParquetUtils` class that uses parquet library to `hudi-hadoop-common` module. To achieve this, a new API `BaseFileUtils#readColumnStatsFromMetadata` is

Re: [PR] [HUDI-7673] Fixing false positive validation failure for RLI with MDT validation tool [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11098: URL: https://github.com/apache/hudi/pull/11098#issuecomment-2103613903 ## CI report: * ef10f888b3c2fc4b4e24bdf7a5b9f9081eb9e54d Azure:

Re: [PR] [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11163: URL: https://github.com/apache/hudi/pull/11163#issuecomment-2103608787 ## CI report: * 7376b451044473ce16aad09a1d356a9140442f9c UNKNOWN * d491f7ed864af5c291d365dcfe9392a5bbc8dd2d UNKNOWN * 12faeb31ad97c03594bc241e2d39196163c5a133 Azure:

Re: [PR] [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11163: URL: https://github.com/apache/hudi/pull/11163#issuecomment-2103603619 ## CI report: * 7376b451044473ce16aad09a1d356a9140442f9c UNKNOWN * d491f7ed864af5c291d365dcfe9392a5bbc8dd2d UNKNOWN * 12faeb31ad97c03594bc241e2d39196163c5a133 Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103595938 ## CI report: * 5fdc71489d8830b2a4966e8e3892cac080738c31 Azure:

Re: [PR] [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11163: URL: https://github.com/apache/hudi/pull/11163#issuecomment-2103595868 ## CI report: * 7376b451044473ce16aad09a1d356a9140442f9c UNKNOWN * d491f7ed864af5c291d365dcfe9392a5bbc8dd2d UNKNOWN * 12faeb31ad97c03594bc241e2d39196163c5a133 Azure:

Re: [PR] [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent [hudi]

2024-05-09 Thread via GitHub
jonvex commented on code in PR #11163: URL: https://github.com/apache/hudi/pull/11163#discussion_r1596087051 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/HoodieFileSliceTestUtils.java: ## @@ -247,36 +245,31 @@ public static HoodieBaseFile createBaseFile(

Re: [PR] [HUDI-7350] Make Hudi reader and writer factory APIs Hadoop-independent [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11163: URL: https://github.com/apache/hudi/pull/11163#discussion_r1596083869 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/HoodieFileSliceTestUtils.java: ## @@ -247,36 +245,31 @@ public static HoodieBaseFile createBaseFile(

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103559055 ## CI report: * b1ca2854a5d9c3bbb6ac63f94adbb7fe11e99a10 Azure:

Re: [PR] [HUDI-7673] Fixing false positive validation failure for RLI with MDT validation tool [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11098: URL: https://github.com/apache/hudi/pull/11098#issuecomment-2103558885 ## CI report: * 0a9fb992f0e3e84d2b39f654140e30846b086e7e Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103553022 ## CI report: * b1ca2854a5d9c3bbb6ac63f94adbb7fe11e99a10 Azure:

Re: [PR] [HUDI-7673] Fixing false positive validation failure for RLI with MDT validation tool [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11098: URL: https://github.com/apache/hudi/pull/11098#issuecomment-2103552776 ## CI report: * 0a9fb992f0e3e84d2b39f654140e30846b086e7e Azure:

Re: [PR] [HUDI-7429] Fixing average record size estimation for delta commits [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #10763: URL: https://github.com/apache/hudi/pull/10763#issuecomment-2103552339 ## CI report: * 34ffbbc913fab393871b866160ea2a7e1b38c53f Azure:

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103528518 ## CI report: * b1ca2854a5d9c3bbb6ac63f94adbb7fe11e99a10 Azure:

Re: [PR] [HUDI-7725] Restructure HFileBootstrapIndex to separate Hadoop-dependent logic [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11171: URL: https://github.com/apache/hudi/pull/11171#issuecomment-2103527546 ## CI report: * 692b8b06bbb7c194bdd107caba9858c07e0d0a1d Azure:

Re: [PR] [HUDI-7508] Avoid collecting records in HoodieStreamerUtils.createHoodieRecords and JsonKafkaSource mapPartitions [hudi]

2024-05-09 Thread via GitHub
nsivabalan commented on PR #10872: URL: https://github.com/apache/hudi/pull/10872#issuecomment-2103497489 hey @vinishjail97 : can you attach the memory profileing you did before and after this patch. and rebase w/ master. we are good to go -- This is an automated message from the Apache

Re: [PR] [HUDI-7523] Add HOODIE_SPARK_DATASOURCE_OPTIONS to be used in HoodieIncrSource [hudi]

2024-05-09 Thread via GitHub
nsivabalan commented on PR #10900: URL: https://github.com/apache/hudi/pull/10900#issuecomment-2103496364 hey @vinishjail97 : can you address the reviews from sagar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-05-09 Thread via GitHub
nsivabalan closed pull request #10909: [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce URL: https://github.com/apache/hudi/pull/10909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-09 Thread via GitHub
jonvex commented on code in PR #11163: URL: https://github.com/apache/hudi/pull/11163#discussion_r1596036250 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/HoodieFileSliceTestUtils.java: ## @@ -247,36 +245,31 @@ public static HoodieBaseFile createBaseFile(

Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11163: URL: https://github.com/apache/hudi/pull/11163#discussion_r1596025879 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/HoodieFileSliceTestUtils.java: ## @@ -247,36 +245,31 @@ public static HoodieBaseFile createBaseFile(

Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11163: URL: https://github.com/apache/hudi/pull/11163#discussion_r1596019932 ## hudi-hadoop-common/src/main/java/org/apache/hudi/io/storage/hadoop/HoodieAvroFileReaderFactory.java: ## @@ -7,19 +7,25 @@ * "License"); you may not use this file

Re: [PR] [HUDI-7350] Create hudi io factory [hudi]

2024-05-09 Thread via GitHub
yihua commented on code in PR #11163: URL: https://github.com/apache/hudi/pull/11163#discussion_r1595997546 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieParquetDataBlock.java: ## @@ -107,38 +106,35 @@ protected byte[] serializeRecords(List records)

Re: [PR] [HUDI-7726] Restructure TableSchemaResolver to separate Hadoop logic and use BaseFileUtils [hudi]

2024-05-09 Thread via GitHub
hudi-bot commented on PR #11185: URL: https://github.com/apache/hudi/pull/11185#issuecomment-2103462611 ## CI report: * b1ca2854a5d9c3bbb6ac63f94adbb7fe11e99a10 Azure:

  1   2   3   >