On 27 May 2013 20:00, Stefan Krawczyk <ste...@nextdoor.com> wrote:

> So it's up to you what you stick into the body of that Avro event. It
> could just be json, or it could be your own serialized Avro event - and as
> far as I understand serialized Avro always has the schema with it (right?).
>

In an Avro data file, yes, because you just need to specify the schema
once, followed by (say) a million records that all use the same schema. And
in an RPC context, you can negotiate the schema once per connection. But
when using a message broker, you're serializing individual records and
don't have an end-to-end connection with the consumer, so you'd need to
include the schema with every single message.

It probably doesn't make sense to include the full schema with every one,
as a typical schema might be 2 kB whereas a serialized record might be less
than 100 bytes (numbers obviously vary wildly by application), so the
schema size would dominate. Hence my suggestion of including a schema
version number or hash with every message.

Be aware that Flume doesn't have great support for languages outside of the
> JVM.
>

The same caveat unfortunately applies with Kafka too. There are clients for
non-JVM languages, but they lack important features, so I would recommend
using the official JVM client (if your application is non-JVM you could
simply pipe your application's stdout into the Kafka producer, or vice
versa on the consumer side).

Martin

Reply via email to