On Thu, Mar 06, 2014 at 02:27:34AM +0000, Sandon Jacobs wrote:
> I understand replication uses a multi-fetch concept to maintain the replicas 
> of each partition. I have a use case where it might be beneficial to grab a 
> “batch” of messages from a kafka topic and process them as one unit into a 
> source system – in my use case, sending the messages to a Flume source.
> 
> My questions:
> 
>   *   Is it possible to fetch a back of messages in which you may not know 
> the exact message size?

The high-level consumer actually uses multi-fetch. You will need to
have some idea of the max message size and set your fetch size
accordingly. Unfortunately if you are consuming a very large number of
topics this can increase the memory requirements of the consumer.  We
intend to address this in the consumer re-write - there is a separate
design review thread on that.

>   *   If so, how are the offsets managed?

The consumer essentially pre-fetches and queues the chunks in memory
and the offsets are not incremented/check-pointed until the
application thread actually iterates over the messsages.

> I am trying to avoid queuing them in memory and batching in my process for 
> several reasons.

The high-level consumer does queuing as described above, but you can
reduce the number of queued chunks.

Joel

Reply via email to