I noticed a similar effect with a test tool, which checked if the order the
records were produced in, was the same as the order in which they were
consumed. Using only one partition it works fine, but using multiple
partitions the order gets messed up. If I'm right this is by design, but I
would like to hear some feedback about this. Because messages with the same
key, end up in the same partition, if you have multiple partitions, only
the order within a partition is the same as the order they where produced
in. But when consuming form multiple partitions the order could be
different.

If this is true it would be interesting what you should do when you have a
topic were the order needs to be kept the same, and needs to be consumed by
more then one consumer at a time?

On Fri, Mar 11, 2016 at 5:50 AM Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> You definitely *might* see data from multiple partitions, and that won't be
> uncommon once you start processing data. However, there is no guarantee.
>
> In practice, it may be unlikely to see data for both partitions on the
> first call to poll() for a simple reason: poll() will return as soon as any
> data for any partition is available. Unless things are timed just right,
> you're probably making requests to different brokers for data in the
> different partitions. These requests won't be perfectly aligned -- one of
> them will get a response first and the poll() will be able to return with
> some data. Since only the one response will have been received, only one
> partition will get data.
>
> After the first poll, you probably spend some time processing that data
> before you call poll again. However, another request has been sent out to
> the broker that returned data faster and the other request also gets
> returned. So on the next poll, you might be more likely to see data from
> both partitions.
>
> So you're right: there's no hard guarantee, and you shouldn't write your
> consumer code to assume that data will be returned for all partitions. (And
> you can't assume that anyway; what if no new data had been published to one
> of the partitions?). However, many times you will see data from multiple
> partitions.
>
> -Ewen
>
> On Thu, Mar 10, 2016 at 11:21 AM, Shrijeet Paliwal <
> shrijeet.pali...@gmail.com> wrote:
>
> > Version: 0.9.0.1
> >
> > I have a test which creates two partitions in a topic, writes data to
> both
> > partitions. Then a single consumer subscribes to the topic, verifies that
> > it has got the assignment of both partitions in that topic & finally
> issues
> > a poll. The firs poll always comes back with records of only one
> partition.
> > I need to poll one more time to get records for the second partition. The
> > poll timeout has no effect on this.
> >
> > Unless I've misunderstood the contract - the first poll *could* have
> > returned records for the both the partitions. After-all poll
> > returns ConsumerRecords<K,V>, which is a map of topic_partitions -->
> > records
> >
> > I acknowledge that API does not make any hard guarantees that align with
> my
> > expectation but  looks like API was crafted to support multiple
> partitions
> > & topics in single call. Is there an implementation detail which
> restricts
> > this? Is there a configuration which is controlling what gets fetched?
> >
> > --
> > Shrijeet
> >
>
>
>
> --
> Thanks,
> Ewen
>

Reply via email to