On the producer side, there's not much you can do to reduce CPU usage if you want low latency and don't have enough throughput to buffer multiple messages -- you're going to end up sending 1 record at a time in order to achieve your desired latency. Note, however, that the producer is thread safe, so if it is possible to combine multiple processes into a single multi-threaded app, you might be able to share a single producer and get better batching.
One the consumer side, for the Java client fetch.min.bytes is already set to 1, which will minimize latency -- data will be returned as soon as any data is available. If you are consistently seeing poll() return no messages in your consumers, try increasing fetch.max.wait.ms. It defaults to 500ms, so I'm guessing you're not hitting this, but if your data is spread across enough partitions and brokers, it's possible you are sending out a bunch of fetch requests that aren't returning any data. Also, as with producers, if you have light enough traffic you will benefit by consolidating to fewer consumers if possible. Fetch requests are made one at a time for *all* partitions the consumer is reading from that have the same leader, which means you'll amortize the cost of requests over multiple topic partitions (while maintaining the low latency guarantees when traffic in all the partitions is light anyway). Finally, as always, your best bet is to measure metrics & profile your app to see where the CPU time is going. -Ewen On Thu, Dec 8, 2016 at 7:44 AM, Niklas Ström <str...@gmail.com> wrote: > Use case scenario: > We want to have a fairly low latency, say below 20 ms, and we want to be > able to run a few hundred processes (on one machine) both producing and > consuming a handful of topics. The throughput is not high, lets say on > average 10 messages per second for each process. Most messages are 50-500 > bytes large, some may be a few kbytes. > > How should we adjust the configuration parameters for our use case? > > Our experiments so far gives us a good latency but at the expence of CPU > utilization. Even with a bad latency, the CPU utilization is not > satisfying. Since we will have a lot of processes we are concerned that > short poll loops will cause an overconsumption of CPU capacity. We are > hoping we might have missed some configuration parameter or that we have > some issues with our environment that we can find and solve. > > We are using both the java client and librdkafka and see similar CPU issues > in both clients. > > We have looked at recommendations from: > https://github.com/edenhill/librdkafka/wiki/How-to- > decrease-message-latency > The only thing that seems to really make a difference for librdkafka is > socket.blocking.max.ms, but reducing that also makes the CPU go up. > > I would really appreciate input on configuration parameters and of any > experience with environment issues that has caused CPU load. Or is our > scenario not feasible at all? > > Cheers > -- Thanks, Ewen