[DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

yuxia Sun, 28 May 2023 23:55:44 -0700

Hi, community . 

I want to start the discussion about Hive dialect shouldn't fall back to 
Flink's default dialect.

Currently, when the HiveParser fail to parse the sql in Hive dialect, it'll
fall back to Flink's default parser[1] to handle flink-specific statements like
"CREATE CATALOG xx with (xx);".

As I‘m involving with Hive dialect and have some communication with community
users who use Hive dialectrecently, I'm thinking throw exception directly
instead of falling back to Flink's default dialect when fail to parse the sql
in Hive dialect

Here're some reasons:

First of all, it'll hide some error with Hive dialect. For example, we found we
can't use Hive dialect any more with Flink sql client in release validation
phase[2], finally we find a modification in Flink sql client cause it, but our
test case can't find it earlier for although HiveParser faill to parse it but
then it'll fall back to default parser and pass test case successfully.

Second, conceptually, Hive dialect should be do nothing with Flink's default
dialect. They are two totally different dialect. If we do need a dialect mixing
Hive dialect and default dialect , may be we need to propose a new hybrid
dialect and announce the hybrid behavior to users.
Also, It made some users confused for the fallback behavior. The fact comes
from I had been ask by community users. Throw an excpetioin directly when fail
to parse the sql statement in Hive dialect will be more intuitive.

Last but not least, it's import to decouple Hive with Flink planner[3] before
we can externalize Hive connector[4]. If we still fall back to Flink default
dialct, then we will need depend on `ParserImpl` in Flink planner, which will
block us removing the provided dependency of Hive dialect as well as
externalizing Hive connector.

Although we hadn't announced the fall back behavior ever, but some users may
implicitly depend on this behavior in theirs sql jobs. So, I hereby open the
dicussion about abandoning the fall back behavior to make Hive dialect clear
and isoloted.
Please remember it won't break the Hive synatax but the syntax specified to
Flink may fail after then. But for the failed sql, you can use `SET
table.sql-dialect=default;` to switch to Flink dialect.
If there's some flink-specific statements we found should be included in Hive
dialect to be easy to use, I think we can still add them as specific cases to
Hive dialect.

Look forwards to your feedback. I'd love to listen the feedback from community
to take the next steps.

[1]:https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParser.java#L348

[2]:https://issues.apache.org/jira/browse/FLINK-26681
[3]:https://issues.apache.org/jira/browse/FLINK-31413
[4]:https://issues.apache.org/jira/browse/FLINK-30064

Best regards,
Yuxia

[DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

Reply via email to