Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


wombatu-kun commented on code in PR #18481:
URL: https://github.com/apache/hudi/pull/18481#discussion_r3050636517


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath, 
CatalogBaseTable table,
 }
 
 HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr = 
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+  HoodieFileFormat configuredFormat = 
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+  if (configuredFormat == HoodieFileFormat.LANCE) {

Review Comment:
   replaced enum `valueOf` with 
`HoodieFileFormat.LANCE.name().equalsIgnoreCase(..)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


hudi-bot commented on PR #18481:
URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205500399

   
   ## CI report:
   
   * c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=13146)
 
   * c501b0e0e6d33cba190ddc1fc0bc63a7866b2ea9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


wombatu-kun commented on code in PR #18481:
URL: https://github.com/apache/hudi/pull/18481#discussion_r3050640638


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath, 
CatalogBaseTable table,
 }
 
 HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr = 
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+  HoodieFileFormat configuredFormat = 
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+  if (configuredFormat == HoodieFileFormat.LANCE) {
+throw new 
HoodieValidationException(HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG);
+  }
+  baseFileFormat = configuredFormat;

Review Comment:
   Not intentional, reverted change of `baseFileFormat`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


codecov-commenter commented on PR #18481:
URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205383789

   ## 
[Codecov](https://app.codecov.io/gh/apache/hudi/pull/18481?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   :x: Patch coverage is `0%` with `4 lines` in your changes missing coverage. 
Please review.
   :white_check_mark: Project coverage is 44.87%. Comparing base 
([`0591933`](https://app.codecov.io/gh/apache/hudi/commit/0591933bc237ab9adb616f6961bbd780deff25a2?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache))
 to head 
([`c760ea7`](https://app.codecov.io/gh/apache/hudi/commit/c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)).
   
   | [Files with missing 
lines](https://app.codecov.io/gh/apache/hudi/pull/18481?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[...rg/apache/hudi/hadoop/HiveHoodieReaderContext.java](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&filepath=hudi-hadoop-mr%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhudi%2Fhadoop%2FHiveHoodieReaderContext.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hpdmVIb29kaWVSZWFkZXJDb250ZXh0LmphdmE=)
 | 0.00% | [1 Missing and 1 partial :warning: 
](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   | 
[...pache/hudi/io/storage/HoodieFileReaderFactory.java](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&filepath=hudi-common%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhudi%2Fio%2Fstorage%2FHoodieFileReaderFactory.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVGaWxlUmVhZGVyRmFjdG9yeS5qYXZh)
 | 0.00% | [1 Missing :warning: 
](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   | 
[...pache/hudi/io/storage/HoodieFileWriterFactory.java](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&filepath=hudi-common%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhudi%2Fio%2Fstorage%2FHoodieFileWriterFactory.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVGaWxlV3JpdGVyRmFjdG9yeS5qYXZh)
 | 0.00% | [1 Missing :warning: 
](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   > :exclamation:  There is a different number of reports uploaded between 
BASE (0591933) and HEAD (c760ea7). Click for more details.
   > 
   > HEAD has 28 uploads less than BASE
   >
   >| Flag | BASE (0591933) | HEAD (c760ea7) |
   >|--|--|--|
   >|spark-client-hadoop-common|1|0|
   >|spark-scala-tests|10|0|
   >|spark-java-tests|15|0|
   >|common-and-other-modules|1|0|
   >|utilities|1|0|
   >
   
   Additional details and impacted files
   
   
   
   ```diff
   @@  Coverage Diff  @@
   ## master   #18481   +/-   ##
   =
   - Coverage 68.77%   44.87%   -23.91% 
   + Complexity28117 8476-19641 
   =
 Files  2452 1196 -1256 
 Lines13503361896-73137 
 Branches  16376 6660 -9716 
   =
   - Hits  9286927773-65096 
   + Misses3480131094 -3707 
   + Partials   7363 3029 -4334 
   ```
   
   | 
[Flag](https://app.codecov.io/gh/apache/hudi/pull/18481/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[common-and-other-modules](https://app.codecov.io/gh/apache/hudi/pull/18481/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `?` | |
   | 
[hadoop-mr-java-client](https://app.codecov.io/gh/apache/hudi/pull/18481/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | `44.87% <0.00%> (-0.01%)` | :arrow_down: |
   | 
[spark-client-had

Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


yihua commented on code in PR #18481:
URL: https://github.com/apache/hudi/pull/18481#discussion_r3050513758


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath, 
CatalogBaseTable table,
 }
 
 HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr = 
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+  HoodieFileFormat configuredFormat = 
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+  if (configuredFormat == HoodieFileFormat.LANCE) {

Review Comment:
   🤖 `HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase())` will throw an 
`IllegalArgumentException` if the property contains an unrecognized value. 
Could you wrap this in a try-catch or use a safer lookup, so the error message 
is user-friendly instead of a raw enum parse failure?
   
   🤖 *This review comment was generated by an AI agent and may contain 
mistakes. Please verify any suggestions before applying.*



##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath, 
CatalogBaseTable table,
 }
 
 HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr = 
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+  HoodieFileFormat configuredFormat = 
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+  if (configuredFormat == HoodieFileFormat.LANCE) {
+throw new 
HoodieValidationException(HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG);
+  }
+  baseFileFormat = configuredFormat;

Review Comment:
   🤖 This line `baseFileFormat = configuredFormat;` changes behavior beyond the 
Lance guard — for ORC or HFile tables, the `baseFileFormat` was previously 
always PARQUET (hardcoded on the line above), which was used to resolve 
InputFormat, OutputFormat, and SerDe class names. Now those tables will get 
their actual format's classes. This might be a desirable fix, but it's a silent 
behavior change that could break Hive catalog registration for non-Parquet 
tables. Was this intentional? If the goal here is only the Lance guard, I'd 
suggest keeping the existing default-to-PARQUET behavior and only adding the 
Lance rejection.
   
   🤖 *This review comment was generated by an AI agent and may contain 
mistakes. Please verify any suggestions before applying.*



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


hudi-bot commented on PR #18481:
URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205341843

   
   ## CI report:
   
   * c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=13146)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


hudi-bot commented on PR #18481:
URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205324460

   
   ## CI report:
   
   * c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



[PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]

2026-04-08 Thread via GitHub


wombatu-kun opened a new pull request, #18481:
URL: https://github.com/apache/hudi/pull/18481

   ### Describe the issue this Pull Request addresses
   
   Closes https://github.com/apache/hudi/issues/18451
   
   ### Summary and Changelog
   
   - Add guards to prevent non-Spark engines (Flink, Hive, Presto, Trino) from 
reading or writing Lance-format Hudi tables. Lance is Spark-only for 1.2.
   - Guards at multiple levels:
 - **Config-time (Flink)**: `HoodieTableFactory` rejects Lance on both 
source and sink; `HoodieHiveCatalog` rejects on table creation
 - **Read-time (Flink)**: `FlinkRowDataReaderContext` rejects Lance files
 - **Read-time (Hive)**: `HiveHoodieReaderContext` rejects Lance files early
 - **Factory-level (all engines)**: Base `HoodieFileReaderFactory` /
   `HoodieFileWriterFactory` throw for Lance; only Spark overrides to 
support it
   - Centralized error message `LANCE_SPARK_ONLY_ERROR_MSG` in 
`HoodieFileFormat`
   
   ### Impact
   
   Ensuring that for now users can only read/write lance base files with Spark 
until we add this support.
   
   ### Risk Level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]