Thanks for all of the information.

I actually looked into Kafka quite some time ago and I think we passed on it 
because it didn't have much ruby support (That may have changed by now).


On May 27, 2013, at 12:34 PM, Martin Kleppmann <mar...@rapportive.com> wrote:

> On 27 May 2013 20:00, Stefan Krawczyk <ste...@nextdoor.com> wrote:
> So it's up to you what you stick into the body of that Avro event. It could 
> just be json, or it could be your own serialized Avro event - and as far as I 
> understand serialized Avro always has the schema with it (right?).
> 
> In an Avro data file, yes, because you just need to specify the schema once, 
> followed by (say) a million records that all use the same schema. And in an 
> RPC context, you can negotiate the schema once per connection. But when using 
> a message broker, you're serializing individual records and don't have an 
> end-to-end connection with the consumer, so you'd need to include the schema 
> with every single message.
> 
> It probably doesn't make sense to include the full schema with every one, as 
> a typical schema might be 2 kB whereas a serialized record might be less than 
> 100 bytes (numbers obviously vary wildly by application), so the schema size 
> would dominate. Hence my suggestion of including a schema version number or 
> hash with every message.
> 
> Be aware that Flume doesn't have great support for languages outside of the 
> JVM.
> 
> The same caveat unfortunately applies with Kafka too. There are clients for 
> non-JVM languages, but they lack important features, so I would recommend 
> using the official JVM client (if your application is non-JVM you could 
> simply pipe your application's stdout into the Kafka producer, or vice versa 
> on the consumer side).
> 
> Martin
> 

Reply via email to