Hi,

I'm reading data from Kafka, which is Avro encoded and has the following
general schema:

{
  "name": "SomeName",
  "doc": "Avro schema with variable embedded encodings",
  "type": "record",
  "fields": [
    {
      "name": "Name",
      "doc": "My name",
      "type": "string"
    },
    {
      "name": "ID",
      "doc": "My ID",
      "type": "string"
    },
    {
      "name": "Result",
      "doc": "Result data, could be encoded differently",
      "type": "bytes"
    },
    {
      "name": "ResultEncoding",
      "doc": "Result encoding media type (e.g. application/avro,
application/json)",
      "type": "string"
    },
  ]
}

Basically, the "Result" field is bytes whose interpretation depends upon
the "ResultEncoding" field i.e. either avro or json. The "Result" byte
stream has its own well defined schema also.

My use case involves extracting/aggregating data from within the embedded
"Result" field. What would be the best approach to perform this runtime
decoding and extraction of fields from the embedded byte data? Would user
defined functions help in this case?

Thanks in advance!
Sumeet

Reply via email to