Did you try using to_protobuf and from_protobuf ? https://spark.apache.org/docs/latest/sql-data-sources-protobuf.html
On Mon, May 27, 2024 at 15:45 Satyam Raj <satyamm...@gmail.com> wrote: > Hello guys, > We're using Spark 3.5.0 for processing Kafka source that contains protobuf > serialized data. The format is as follows: > > message Request { > long sent_ts = 1; > Event[] event = 2; > } > > message Event { > string event_name = 1; > bytes event_bytes = 2; > } > > The event_bytes contains the data for the event_name. event_name is the > className of the Protobuf class. > Currently, we parse the protobuf message from the Kafka topic, and for > every event in the array, push the event_bytes to the `event_name` topic, > over which spark jobs run and use the same event_name protobuf class to > deserialize the data. > > Is there a better way to do all this in a single job? >