Many thanks! Am Mi., 15. Juli 2020 um 15:58 Uhr schrieb Aljoscha Krettek < aljos...@apache.org>:
> On 11.07.20 10:31, Georg Heiler wrote: > > 1) similarly to spark the Table API works on some optimized binary > > representation > > 2) this is only available in the SQL way of interaction - there is no > > programmatic API > > yes it's available from SQL, but also the Table API, which is a > programmatic declarative API, similar to Spark's Structured Streaming. > > > > q1) I have read somewhere (I think in some Flink Forward presentations) > > that the SQL API is not necessarily stable with regards to state - even > > with small changes to the DAG (due to optimization). So does this also > > /still apply to the table API? (I assume yes) > > Yes, unfortunately this is correct. Because the Table API/SQL is > declarative users don't have control over the DAG and the state that the > operators have. Some work will happen on at least making sure that the > optimizer stays stable between Flink versions or that we can let users > pin a certain physical graph of a query so that it can be re-used across > versions. > > > q2) When I use the DataSet/Stream (classical scala/java) API it looks > like > > I must create a custom serializer if I want to handle one/all of: > > > > - side-output failing records and not simply crash the job > > - as asked before automatic serialization to a scala (case) class > > This is true, yes. > > > But I also read that creating the ObjectMapper (i.e. in Jackson terms) > > inside the map function is not recommended. From Spark I know that there > is > > a map-partitions function, i.e. something where a database connection can > > be created and then reused for the individua elements. Is a similar > > construct available in Flink as well? > > Yes, for this you can use "rich functions", which have an open()/close() > method that allows initializing and re-using resources across > invocations: > > https://ci.apache.org/projects/flink/flink-docs-master/dev/user_defined_functions.html#rich-functions > > > Also, I have read a lot of articles and it looks like a lot of people > > are using the String serializer and then manually parse the JSON which > also > > seems inefficient. > > Where would I find an example for some Serializer with side outputs for > > failed records as well as efficient initialization using some similar > > construct to map-partitions? > > I'm not aware of such examples, unfortunately. > > I hope that at least some answers will be helpful! > > Best, > Aljoscha >