Barry,

It might help to know whether you're hitting a (single threaded) CPU limit
or if the bottleneck is elsewhere. Also, how large on average are the
messages you are consuming? There's nothing that'll force batching like
you're talking about. You can tweak any consumer settings via worker-level
config overrides (see
http://docs.confluent.io/3.0.0/connect/userguide.html#overriding-producer-consumer-settings)
if the defaults aren't working well for you for some reason. 10s sounds
quite long, so I suspect there's some other bottleneck or issue that's
causing it to take so long -- by default consumer fetch requests should
return immediately if any data is available, and even if you increase
fetch.min.bytes, the longest it waits by default is 500ms as defined by
fetch.max.wait.ms.

-Ewen

On Thu, Jun 9, 2016 at 7:06 PM Barry Kaplan <bkap...@memelet.com> wrote:

> I am running a connect consumer that receives JSON records and indexes into
> elasticsearch. The consumer is pushing out 300 messages/s into the a topic
> with a single partition. The connect job is configured with 1 task. (This
> is all for testing).
>
> What I see is that push is called about every 10s with about 1500 records.
> It takes about 1.5 seconds of wall time to complete the indexing of those
> records into elasticsearch. But then the task waits another 10s for the
> next batch from kafka connect.
>
> Is there some kind of consumer throttling happening? I cannot find any
> settings that would tell connect to deliver messages faster or in larger
> batches.
>
> I can of course run with more partitions and more tasks, but still, kafka
> connect should be able to deliver messages to the task orders of magnitude
> faster than elasticsearch can index them.
>

Reply via email to