On Thu, Mar 06, 2014 at 02:27:34AM +0000, Sandon Jacobs wrote: > I understand replication uses a multi-fetch concept to maintain the replicas > of each partition. I have a use case where it might be beneficial to grab a > “batch” of messages from a kafka topic and process them as one unit into a > source system – in my use case, sending the messages to a Flume source. > > My questions: > > * Is it possible to fetch a back of messages in which you may not know > the exact message size?
The high-level consumer actually uses multi-fetch. You will need to have some idea of the max message size and set your fetch size accordingly. Unfortunately if you are consuming a very large number of topics this can increase the memory requirements of the consumer. We intend to address this in the consumer re-write - there is a separate design review thread on that. > * If so, how are the offsets managed? The consumer essentially pre-fetches and queues the chunks in memory and the offsets are not incremented/check-pointed until the application thread actually iterates over the messsages. > I am trying to avoid queuing them in memory and batching in my process for > several reasons. The high-level consumer does queuing as described above, but you can reduce the number of queued chunks. Joel
