Whats more, there are examples and support for Kafka, but not so much for
Flume.


On Mon, May 27, 2013 at 6:25 AM, Martin Kleppmann <mar...@rapportive.com>wrote:

> I don't have experience with Flume, so I can't comment on that. At
> LinkedIn we ship logs around by sending Avro-encoded messages to Kafka (
> http://kafka.apache.org/). Kafka is nice, it scales very well and gives a
> great deal of flexibility — logs can be consumed by any number of
> independent consumers, consumers can catch up on a backlog if they're
> disconnected for a while, and it comes with Hadoop import out of the box.
>
> (RabbitMQ is more designed or use cases where each message corresponds to
> a task that needs to be performed by a worker. IMHO Kafka is a better fit
> for logs, which are more stream-like.)
>
> With any message broker, you'll need to somehow tag each message with the
> schema that was used to encode it. You could include the full schema with
> every message, but unless you have very large messages, that would be a
> huge overhead. Better to give each version of your schema a sequential
> version number, or hash the schema, and include the version number/hash in
> each message. You can then keep a repository of schemas for resolving those
> version numbers or hashes – simply in files that you distribute to all
> producers/consumers, or in a simple REST service like
> https://issues.apache.org/jira/browse/AVRO-1124
>
> Hope that helps,
> Martin
>
>
> On 26 May 2013 17:39, Mark <static.void....@gmail.com> wrote:
>
>> Yes our central server would be Hadoop.
>>
>> Exactly how would this work with flume? Would I write Avro to a file
>> source which flume would then ship over to one of our collectors  or is
>> there a better/native way? Would I have to include the schema in each
>> event? FYI we would be doing this primarily from a rails application.
>>
>> Does anyone ever use Avro with a message bus like RabbitMQ?
>>
>> On May 23, 2013, at 9:16 PM, Sean Busbey <bus...@cloudera.com> wrote:
>>
>> Yep. Avro would be great at that (provided your central consumer is Avro
>> friendly, like a Hadoop system).  Make sure that all of your schemas have
>> default values defined for fields so that schema evolution will be easier
>> in the future.
>>
>>
>> On Thu, May 23, 2013 at 4:29 PM, Mark <static.void....@gmail.com> wrote:
>>
>>> We're thinking about generating logs and events with Avro and shipping
>>> them to a central collector service via Flume. Is this a valid use case?
>>>
>>>
>>
>>
>> --
>> Sean
>>
>>
>>
>


-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com

Reply via email to