Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
wombatu-kun commented on code in PR #18481:
URL: https://github.com/apache/hudi/pull/18481#discussion_r3050636517
##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath,
CatalogBaseTable table,
}
HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr =
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+ HoodieFileFormat configuredFormat =
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+ if (configuredFormat == HoodieFileFormat.LANCE) {
Review Comment:
replaced enum `valueOf` with
`HoodieFileFormat.LANCE.name().equalsIgnoreCase(..)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
hudi-bot commented on PR #18481: URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205500399 ## CI report: * c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=13146) * c501b0e0e6d33cba190ddc1fc0bc63a7866b2ea9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
wombatu-kun commented on code in PR #18481:
URL: https://github.com/apache/hudi/pull/18481#discussion_r3050640638
##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath,
CatalogBaseTable table,
}
HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr =
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+ HoodieFileFormat configuredFormat =
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+ if (configuredFormat == HoodieFileFormat.LANCE) {
+throw new
HoodieValidationException(HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG);
+ }
+ baseFileFormat = configuredFormat;
Review Comment:
Not intentional, reverted change of `baseFileFormat`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
codecov-commenter commented on PR #18481: URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205383789 ## [Codecov](https://app.codecov.io/gh/apache/hudi/pull/18481?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report :x: Patch coverage is `0%` with `4 lines` in your changes missing coverage. Please review. :white_check_mark: Project coverage is 44.87%. Comparing base ([`0591933`](https://app.codecov.io/gh/apache/hudi/commit/0591933bc237ab9adb616f6961bbd780deff25a2?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)) to head ([`c760ea7`](https://app.codecov.io/gh/apache/hudi/commit/c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)). | [Files with missing lines](https://app.codecov.io/gh/apache/hudi/pull/18481?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines | |---|---|---| | [...rg/apache/hudi/hadoop/HiveHoodieReaderContext.java](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&filepath=hudi-hadoop-mr%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhudi%2Fhadoop%2FHiveHoodieReaderContext.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hpdmVIb29kaWVSZWFkZXJDb250ZXh0LmphdmE=) | 0.00% | [1 Missing and 1 partial :warning: ](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | | [...pache/hudi/io/storage/HoodieFileReaderFactory.java](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&filepath=hudi-common%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhudi%2Fio%2Fstorage%2FHoodieFileReaderFactory.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVGaWxlUmVhZGVyRmFjdG9yeS5qYXZh) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | | [...pache/hudi/io/storage/HoodieFileWriterFactory.java](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&filepath=hudi-common%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhudi%2Fio%2Fstorage%2FHoodieFileWriterFactory.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVGaWxlV3JpdGVyRmFjdG9yeS5qYXZh) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/hudi/pull/18481?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | > :exclamation: There is a different number of reports uploaded between BASE (0591933) and HEAD (c760ea7). Click for more details. > > HEAD has 28 uploads less than BASE > >| Flag | BASE (0591933) | HEAD (c760ea7) | >|--|--|--| >|spark-client-hadoop-common|1|0| >|spark-scala-tests|10|0| >|spark-java-tests|15|0| >|common-and-other-modules|1|0| >|utilities|1|0| > Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #18481 +/- ## = - Coverage 68.77% 44.87% -23.91% + Complexity28117 8476-19641 = Files 2452 1196 -1256 Lines13503361896-73137 Branches 16376 6660 -9716 = - Hits 9286927773-65096 + Misses3480131094 -3707 + Partials 7363 3029 -4334 ``` | [Flag](https://app.codecov.io/gh/apache/hudi/pull/18481/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [common-and-other-modules](https://app.codecov.io/gh/apache/hudi/pull/18481/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | | | [hadoop-mr-java-client](https://app.codecov.io/gh/apache/hudi/pull/18481/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `44.87% <0.00%> (-0.01%)` | :arrow_down: | | [spark-client-had
Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
yihua commented on code in PR #18481:
URL: https://github.com/apache/hudi/pull/18481#discussion_r3050513758
##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath,
CatalogBaseTable table,
}
HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr =
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+ HoodieFileFormat configuredFormat =
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+ if (configuredFormat == HoodieFileFormat.LANCE) {
Review Comment:
🤖 `HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase())` will throw an
`IllegalArgumentException` if the property contains an unrecognized value.
Could you wrap this in a try-catch or use a safer lookup, so the error message
is user-friendly instead of a raw enum parse failure?
🤖 *This review comment was generated by an AI agent and may contain
mistakes. Please verify any suggestions before applying.*
##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##
@@ -637,6 +637,14 @@ private Table instantiateHiveTable(ObjectPath tablePath,
CatalogBaseTable table,
}
HoodieFileFormat baseFileFormat = HoodieFileFormat.PARQUET;
+String baseFileFormatStr =
properties.get(HoodieTableConfig.BASE_FILE_FORMAT.key());
+if (baseFileFormatStr != null) {
+ HoodieFileFormat configuredFormat =
HoodieFileFormat.valueOf(baseFileFormatStr.toUpperCase());
+ if (configuredFormat == HoodieFileFormat.LANCE) {
+throw new
HoodieValidationException(HoodieFileFormat.LANCE_SPARK_ONLY_ERROR_MSG);
+ }
+ baseFileFormat = configuredFormat;
Review Comment:
🤖 This line `baseFileFormat = configuredFormat;` changes behavior beyond the
Lance guard — for ORC or HFile tables, the `baseFileFormat` was previously
always PARQUET (hardcoded on the line above), which was used to resolve
InputFormat, OutputFormat, and SerDe class names. Now those tables will get
their actual format's classes. This might be a desirable fix, but it's a silent
behavior change that could break Hive catalog registration for non-Parquet
tables. Was this intentional? If the goal here is only the Lance guard, I'd
suggest keeping the existing default-to-PARQUET behavior and only adding the
Lance rejection.
🤖 *This review comment was generated by an AI agent and may contain
mistakes. Please verify any suggestions before applying.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
hudi-bot commented on PR #18481: URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205341843 ## CI report: * c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=13146) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
hudi-bot commented on PR #18481: URL: https://github.com/apache/hudi/pull/18481#issuecomment-4205324460 ## CI report: * c760ea70c3ba3b7ffc7ee7c2b2922d3e4402939f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
[PR] feat(lance): throwing exception/guard for users trying to read Lance from non-spark engnies [hudi]
wombatu-kun opened a new pull request, #18481: URL: https://github.com/apache/hudi/pull/18481 ### Describe the issue this Pull Request addresses Closes https://github.com/apache/hudi/issues/18451 ### Summary and Changelog - Add guards to prevent non-Spark engines (Flink, Hive, Presto, Trino) from reading or writing Lance-format Hudi tables. Lance is Spark-only for 1.2. - Guards at multiple levels: - **Config-time (Flink)**: `HoodieTableFactory` rejects Lance on both source and sink; `HoodieHiveCatalog` rejects on table creation - **Read-time (Flink)**: `FlinkRowDataReaderContext` rejects Lance files - **Read-time (Hive)**: `HiveHoodieReaderContext` rejects Lance files early - **Factory-level (all engines)**: Base `HoodieFileReaderFactory` / `HoodieFileWriterFactory` throw for Lance; only Spark overrides to support it - Centralized error message `LANCE_SPARK_ONLY_ERROR_MSG` in `HoodieFileFormat` ### Impact Ensuring that for now users can only read/write lance base files with Spark until we add this support. ### Risk Level none ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
