Did you try using to_protobuf and from_protobuf ?

https://spark.apache.org/docs/latest/sql-data-sources-protobuf.html


On Mon, May 27, 2024 at 15:45 Satyam Raj <satyamm...@gmail.com> wrote:

> Hello guys,
> We're using Spark 3.5.0 for processing Kafka source that contains protobuf
> serialized data. The format is as follows:
>
> message Request {
>   long sent_ts = 1;
>   Event[] event = 2;
> }
>
> message Event {
>  string event_name = 1;
>  bytes event_bytes = 2;
> }
>
> The event_bytes contains the data for the event_name. event_name is the
> className of the Protobuf class.
> Currently, we parse the protobuf message from the Kafka topic, and for
> every event in the array, push the event_bytes to the `event_name` topic,
> over which spark jobs run and use the same event_name protobuf class to
> deserialize the data.
>
> Is there a better way to do all this in a single job?
>

Reply via email to