If I understand correctly, SQL semantics are strict on column schema.
Reading via Kafka data source doesn't require you to specify the schema as
it provides the key and value as binary, but once you deserialize them,
unless you keep the type as primitive (e.g. String), you'll need to specify
the schema, like from_json requires you to.

This wouldn't be changed even if you leverage Schema Registry - you'll need
to provide the schema which is compatible with all schemas which records
are associated with. I guess that's guaranteed if you use the latest
version of the schema and you've changed the schema as "backward-compatible
ways". I admit I haven't dealt with SR in SSS, but if you integrate the
schema to the query plan, running query is unlikely getting the latest
schema, but it still wouldn't matter as your query should only leverage the
part of schema you've integrated, and the latest schema is "backward
compatible" with the integrated schema.

Hope this helps.

Thanks
Jungtaek Lim (HeartSaVioR)

On Mon, Mar 15, 2021 at 9:25 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> This is just a query.
>
> In general Kafka-connect requires means to register that schema such that
> producers and consumers understand that. It also allows schema evolution,
> i.e. changes to metadata that identifies the structure of data sent via
> topic.
>
> When we stream a kafka topic into (Spark Structured Streaming (SSS), the
> assumption is that by the time Spark processes that data, its structure
> can be established. With foreachBatch, we create a dataframe on top of
> incoming batches of Json messages and the dataframe can be interrogated.
> However, the processing may fail if another column is added to the topic
> and the consumer (in this case SSS) is not aware of it. How can this change
> of schema be verified?
>
> Thanks
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Reply via email to