Hello guys,
We're using Spark 3.5.0 for processing Kafka source that contains protobuf
serialized data. The format is as follows:
message Request {
long sent_ts = 1;
Event[] event = 2;
}
message Event {
string event_name = 1;
bytes event_bytes = 2;
}
The event_bytes contains the data for the event_name. event_name is the
className of the Protobuf class.
Currently, we parse the protobuf message from the Kafka topic, and for
every event in the array, push the event_bytes to the `event_name` topic,
over which spark jobs run and use the same event_name protobuf class to
deserialize the data.
Is there a better way to do all this in a single job?