Whats more, there are examples and support for Kafka, but not so much for Flume.
On Mon, May 27, 2013 at 6:25 AM, Martin Kleppmann <mar...@rapportive.com>wrote: > I don't have experience with Flume, so I can't comment on that. At > LinkedIn we ship logs around by sending Avro-encoded messages to Kafka ( > http://kafka.apache.org/). Kafka is nice, it scales very well and gives a > great deal of flexibility — logs can be consumed by any number of > independent consumers, consumers can catch up on a backlog if they're > disconnected for a while, and it comes with Hadoop import out of the box. > > (RabbitMQ is more designed or use cases where each message corresponds to > a task that needs to be performed by a worker. IMHO Kafka is a better fit > for logs, which are more stream-like.) > > With any message broker, you'll need to somehow tag each message with the > schema that was used to encode it. You could include the full schema with > every message, but unless you have very large messages, that would be a > huge overhead. Better to give each version of your schema a sequential > version number, or hash the schema, and include the version number/hash in > each message. You can then keep a repository of schemas for resolving those > version numbers or hashes – simply in files that you distribute to all > producers/consumers, or in a simple REST service like > https://issues.apache.org/jira/browse/AVRO-1124 > > Hope that helps, > Martin > > > On 26 May 2013 17:39, Mark <static.void....@gmail.com> wrote: > >> Yes our central server would be Hadoop. >> >> Exactly how would this work with flume? Would I write Avro to a file >> source which flume would then ship over to one of our collectors or is >> there a better/native way? Would I have to include the schema in each >> event? FYI we would be doing this primarily from a rails application. >> >> Does anyone ever use Avro with a message bus like RabbitMQ? >> >> On May 23, 2013, at 9:16 PM, Sean Busbey <bus...@cloudera.com> wrote: >> >> Yep. Avro would be great at that (provided your central consumer is Avro >> friendly, like a Hadoop system). Make sure that all of your schemas have >> default values defined for fields so that schema evolution will be easier >> in the future. >> >> >> On Thu, May 23, 2013 at 4:29 PM, Mark <static.void....@gmail.com> wrote: >> >>> We're thinking about generating logs and events with Avro and shipping >>> them to a central collector service via Flume. Is this a valid use case? >>> >>> >> >> >> -- >> Sean >> >> >> > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com