Hi Flink users,
Sharing something that solved a recurring problem for us in production,
in case others are hitting the same wall.
1. THE PROBLEM
In Flink 1.20, there is no supported way to deploy a .sql file containing both
DDL and DML statements via "flink run" in Application or Per-Job mode. SQL
Client's -f flag only works in Session mode; the Application mode equivalent
(FLIP-480) is implemented in Flink 2.x but not backported.
For teams not ready to migrate to Flink 2.x which is a significant
breaking-change upgrade. This means either writing Java wrappers around each
SQL statement, or living with Session mode’s resource-sharing limitations.
2. WHAT WE BUILT
A small launcher JAR that fills this gap:
$FLINK_HOME/bin/flink run \
--target yarn-application \
flink-sql-bootstrap.jar \
--script-file hdfs://warehouse/jobs/dwd_orders.sql
A single command, with DDL + DML in one file, running in Application mode. It
also supports:
- Catalog snapshots: pre-register tables, views, and UDFs from a JSON file so
SQL scripts contain zero DDL
- Per-operator resource tuning: set parallelism, CPU, and memory per operator
via a JSON config, filling the gap between Flink SQL and DataStream-level
resource control
- Dry-run modes: --validate (syntax check, ~2s, no cluster needed) and
--compile (outputs the optimized plan JSON), useful for CI/CD pipelines
Verified on Flink 1.20.4, 2.0.2, 2.1.1, and 2.2.0.
Repo: https://github.com/tonyabasy/flink-sql-bootstrap
I'm curious whether others have encountered the same deployment challenges, and
what workarounds you've been using. Also happy to discuss if this approach
could be useful in your setup.
Best,
Zhao Wang