退订
At 2024-02-26 20:55:19, "Salva Alcántara" <salcantara...@gmail.com> wrote: Awesome Andrew, thanks a lot for the info! On Sun, Feb 25, 2024 at 4:37 PM Andrew Otto <o...@wikimedia.org> wrote: > the following code generator Oh, and FWIW we avoid code generation and POJOs, and instead rely on Flink's Row or RowData abstractions. On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto <o...@wikimedia.org> wrote: Hi! I'm not sure if this totally is relevant for you, but we use JSONSchema and JSON with Flink at the Wikimedia Foundation. We explicitly disallow the use of additionalProperties, unless it is to define Map type fields (where additionalProperties itself is a schema). We have JSONSchema converters and JSON Serdes to be able to use our JSONSchemas and JSON records with both the DataStream API (as Row) and Table API (as RowData). See: - https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json - https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object State schema evolution is supported via the EventRowTypeInfo wrapper. Less directly about Flink: I gave a talk at Confluent's Current conf in 2022 about why we use JSONSchema. See also this blog post series if you are interested! -Andrew Otto Wikimedia Foundation On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara <salcantara...@gmail.com> wrote: I'm facing some issues related to schema evolution in combination with the usage of Json Schemas and I was just wondering whether there are any recommended best practices. In particular, I'm using the following code generator: - https://github.com/joelittlejohn/jsonschema2pojo Main gotchas so far relate to the `additionalProperties` field. When setting that to true, the resulting POJO is not valid according to Flink rules because the generated getter/setter methods don't follow the java beans naming conventions, e.g., see here: - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589 This means that the Kryo fallback is used for serialization purposes, which is not only bad for performance but also breaks state schema evolution. So, because of that, setting `additionalProperties` to `false` looks like a good idea but then your job will break if an upstream/producer service adds a property to the messages you are reading. To solve this problem, the POJOs for your job (as a reader) can be generated to ignore the `additionalProperties` field (via the `@JsonIgnore` Jackson annotation). This seems to be a good overall solution to the problem, but looks a bit convoluted to me / didn't come without some trial & error (= pain & frustration). Is there anyone here facing similar issues? It would be good to hear your thoughts on this! BTW, this is very interesting article that touches on the above mentioned difficulties: - https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html