Community fork of Flink Scala API for Scala 2.12/2.13/3.x

Roman Grebennikov Thu, 12 May 2022 07:43:11 -0700

Hello,

As far as I understand discussions in this mailist, now there is almost no 
people maintaining the official Scala API in Apache Flink. Due to some 
technical complexities it will be probably stuck for a very long time on Scala 
2.12 (which is not EOL yet, but quite close to):
* Traversable serializer relies a lot on CanBuildFrom (so it's read and 
compiled on restore), which is missing in Scala 2.13 and 3.x - migrating off 
from this approach maintaining a savepoint compatibility can be quite a complex 
task.
* Scala API uses an implicitly generated TypeInformation, which is generated by 
a giant scary mkTypeInfo macro, which should be completely rewritten for Scala 
3.x.


But even in the current state, scala support in Flink has some issues with ADT 
(sealed traits, popular data modelling pattern) not being natively supported, 
so if you use them, you have to fall back to Kryo, which is not that fast: 
we've seed 3x-4x throughput drops in performance tests.

In my current company we made a library (https://github.com/findify/flink-adt) 
which used Magnolia (https://github.com/softwaremill/magnolia) to do all the 
compile-time TypeInformation generation to make Scala ADT nice & fast in Flink. 
With a couple of community contributions it was now possible to cross-build it 
also for scala3.

As Flink 1.15 core is scala free, we extracted the DataStream part of Flink 
Scala API into a separate project, glued it together with flink-adt and 
ClosureCleaner from Spark 3.2 (supporting Scala 2.13 and jvm17) and 
cross-compiled it for 2.12/2.13/3.x. You can check out the result on this 
github project: https://github.com/findify/flink-scala-api

So technically speaking, now it's possible to migrate a scala flink job from 
2.12 to 3.x with:
* replace flink-streaming-scala dependency with flink-scala-api (optional, both 
libs can co-exist in classpath on 2.12)
* replace all imports of org.apache.flink.streaming.api.scala._ with ones from 
the new library
* rebuild the job for 3.x

The main drawback is that there is no savepoint compatibility due to 
CanBuildFrom and different way of handling ADTs. But if you can afford 
re-bootstrapping the state - migration is quite straightforward.

The README on github https://github.com/findify/flink-scala-api#readme has some 
more details on how and why this project was done in this way. And the project 
is a bit experimental, so if you're interested in scala3 on Flink, you're 
welcome to share your feedback and ideas. 

with best regards,
Roman Grebennikov | g...@dfdx.me

Community fork of Flink Scala API for Scala 2.12/2.13/3.x

Reply via email to