Hi Camel Gurus,

I've faced with some performance issues of camel-kafka component during
migrating it from 2.17.0 then to 2.17.1 and then to 2.17.2.

The camel route is pretty simple and looks like this

from("file:/var/lib/app/input")
    .split().simple("\n").streaming()
        .to("direct:kafka");
from("direct:kafka")
    .to("kafka:brokerAddr?topic=messages");

The first issue with camel 2.17.0 was the possibility of losing messages
<https://github.com/apache/camel/blob/camel-2.17.0/components/camel-kafka/src/main/java/org/apache/camel/component/kafka/KafkaProducer.java#L101>.
Kafka's native producer is buffering the messages
<https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L468>
and if kafka broker is unavailable then the messages can be lost when the
route is restarted. Although the messages can be lost, the performance was
pretty good (~10K rps) due to kafka's producer buffering.

The second issue with camel 2.17.1 was that the performance of kafka
producer degraded tremendously (up to 100 times) because of blocking on
every message
<https://github.com/apache/camel/blob/camel-2.17.1/components/camel-kafka/src/main/java/org/apache/camel/component/kafka/KafkaProducer.java#L100>
(although in that case no message losing occurs).

The third issue with camel 2.17.2 (although camel started using async
callbacks
<https://github.com/apache/camel/blob/camel-2.17.2/components/camel-kafka/src/main/java/org/apache/camel/component/kafka/KafkaProducer.java#L180>)
was that the performance was still pretty poor because kafka's native
producer was not able to buffer more than a single message (because of
synchronous direct endpoint).

The two solutions for the mentioned issues I was able to figure out:

- using seda endpoint instead of direct one (then kafka's native producer
is able to buffer the messages, but there is still a possibility to lose
messages (because of nature of seda))

- using aggregator with direct endpoint (then the route becomes more
complicated than it is expected to be, aggregator adds additional not
necessary delays and why at all we need additional aggregator for batching
if the kafka's native producer already does buffering/batching?)

So the question is - is there any possibility to allow kafka's native
producer buffer more than a single message not using aggregator eip and not
lose the messages as it can happen with intermediate seda endpoint?

Kind Regards,
Sergey

Reply via email to