I am trying to create a spark structured streaming job which reads from a
Kafka topic and the events coming from that Kafka topic will have different
schemas (There is no standard schema for the incoming events).

Sample incoming events:

event1: {timestamp:2018-09-28T15:50:57.2420418+00:00, value: 11}
event2: {timestamp:2018-09-28T15:50:57.2420418+00:00, value: 11,
location:abc}
event3: {order_id:1, ordervalue: 11}
How can I create a spark structured streaming to read the above events
without stopping the spark job for making any new schema changes ?

Also we may need to provide schema while using spark.readStream(). I thought
of reading a small subset of incoming data and derive the schema from it.
But, that might not work here as the incoming data is disparate and we may
have different schemas for each of the incoming events.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to