If I understand correctly, SQL semantics are strict on column schema. Reading via Kafka data source doesn't require you to specify the schema as it provides the key and value as binary, but once you deserialize them, unless you keep the type as primitive (e.g. String), you'll need to specify the schema, like from_json requires you to.
This wouldn't be changed even if you leverage Schema Registry - you'll need to provide the schema which is compatible with all schemas which records are associated with. I guess that's guaranteed if you use the latest version of the schema and you've changed the schema as "backward-compatible ways". I admit I haven't dealt with SR in SSS, but if you integrate the schema to the query plan, running query is unlikely getting the latest schema, but it still wouldn't matter as your query should only leverage the part of schema you've integrated, and the latest schema is "backward compatible" with the integrated schema. Hope this helps. Thanks Jungtaek Lim (HeartSaVioR) On Mon, Mar 15, 2021 at 9:25 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > This is just a query. > > In general Kafka-connect requires means to register that schema such that > producers and consumers understand that. It also allows schema evolution, > i.e. changes to metadata that identifies the structure of data sent via > topic. > > When we stream a kafka topic into (Spark Structured Streaming (SSS), the > assumption is that by the time Spark processes that data, its structure > can be established. With foreachBatch, we create a dataframe on top of > incoming batches of Json messages and the dataframe can be interrogated. > However, the processing may fail if another column is added to the topic > and the consumer (in this case SSS) is not aware of it. How can this change > of schema be verified? > > Thanks > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >