Re:Re: Schema Evolution & Json Schemas

Jensen Sun, 10 Mar 2024 08:12:26 -0700

退订
















At 2024-02-26 20:55:19, "Salva Alcántara" <salcantara...@gmail.com> wrote:

Awesome Andrew, thanks a lot for the info!


On Sun, Feb 25, 2024 at 4:37 PM Andrew Otto <o...@wikimedia.org> wrote:

>  the following code generator
Oh, and FWIW we avoid code generation and POJOs, and instead rely on Flink's 
Row or RowData abstractions.










On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto <o...@wikimedia.org> wrote:

Hi! 


I'm not sure if this totally is relevant for you, but we use JSONSchema and 
JSON with Flink at the Wikimedia Foundation. 
We explicitly disallow the use of additionalProperties, unless it is to define 
Map type fields (where additionalProperties itself is a schema).


We have JSONSchema converters and JSON Serdes to be able to use our JSONSchemas 
and JSON records with both the DataStream API (as Row) and Table API (as 
RowData).


See:
- 
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
- 
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object


State schema evolution is supported via the EventRowTypeInfo wrapper.


Less directly about Flink: I gave a talk at Confluent's Current conf in 2022 
about why we use JSONSchema. See also this blog post series if you are 
interested!


-Andrew Otto
 Wikimedia Foundation




On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara <salcantara...@gmail.com> wrote:

I'm facing some issues related to schema evolution in combination with the 
usage of Json Schemas and I was just wondering whether there are any 
recommended best practices.


In particular, I'm using the following code generator:


- https://github.com/joelittlejohn/jsonschema2pojo



Main gotchas so far relate to the `additionalProperties` field. When setting 
that to true, the resulting POJO is not valid according to Flink rules because 
the generated getter/setter methods don't follow the java beans naming 
conventions, e.g., see here:


- https://github.com/joelittlejohn/jsonschema2pojo/issues/1589


This means that the Kryo fallback is used for serialization purposes, which is 
not only bad for performance but also breaks state schema evolution.


So, because of that, setting `additionalProperties` to `false` looks like a 
good idea but then your job will break if an upstream/producer service adds a 
property to the messages you are reading. To solve this problem, the POJOs for 
your job (as a reader) can be generated to ignore the `additionalProperties` 
field (via the `@JsonIgnore` Jackson annotation). This seems to be a good 
overall solution to the problem, but looks a bit convoluted to me / didn't come 
without some trial & error (= pain & frustration).


Is there anyone here facing similar issues? It would be good to hear your 
thoughts on this!


BTW, this is very interesting article that touches on the above mentioned 
difficulties:
- 
https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html

Re:Re: Schema Evolution & Json Schemas

Reply via email to