Barry, It might help to know whether you're hitting a (single threaded) CPU limit or if the bottleneck is elsewhere. Also, how large on average are the messages you are consuming? There's nothing that'll force batching like you're talking about. You can tweak any consumer settings via worker-level config overrides (see http://docs.confluent.io/3.0.0/connect/userguide.html#overriding-producer-consumer-settings) if the defaults aren't working well for you for some reason. 10s sounds quite long, so I suspect there's some other bottleneck or issue that's causing it to take so long -- by default consumer fetch requests should return immediately if any data is available, and even if you increase fetch.min.bytes, the longest it waits by default is 500ms as defined by fetch.max.wait.ms.
-Ewen On Thu, Jun 9, 2016 at 7:06 PM Barry Kaplan <bkap...@memelet.com> wrote: > I am running a connect consumer that receives JSON records and indexes into > elasticsearch. The consumer is pushing out 300 messages/s into the a topic > with a single partition. The connect job is configured with 1 task. (This > is all for testing). > > What I see is that push is called about every 10s with about 1500 records. > It takes about 1.5 seconds of wall time to complete the indexing of those > records into elasticsearch. But then the task waits another 10s for the > next batch from kafka connect. > > Is there some kind of consumer throttling happening? I cannot find any > settings that would tell connect to deliver messages faster or in larger > batches. > > I can of course run with more partitions and more tasks, but still, kafka > connect should be able to deliver messages to the task orders of magnitude > faster than elasticsearch can index them. >