Spark Protobuf Deserialization

Satyam Raj Mon, 27 May 2024 12:40:30 -0700

Hello guys,
We're using Spark 3.5.0 for processing Kafka source that contains protobuf
serialized data. The format is as follows:


message Request {
  long sent_ts = 1;
  Event[] event = 2;
}

message Event {
 string event_name = 1;
 bytes event_bytes = 2;
}

The event_bytes contains the data for the event_name. event_name is the
className of the Protobuf class.
Currently, we parse the protobuf message from the Kafka topic, and for
every event in the array, push the event_bytes to the `event_name` topic,
over which spark jobs run and use the same event_name protobuf class to
deserialize the data.

Is there a better way to do all this in a single job?

Spark Protobuf Deserialization

Reply via email to