Ok, I think I found the cause of the problem. The default value
of max_in_flight_requests_per_connection is 5 in Python producer. This
turns out too small for my environment and application. When this value is
reached and the producer tries several times and still fails, the message
is dropped. And since send() is asynchronous, there is no way for the
producer to report the failure immediately. So one solution is to block for
'synchronous' sends as described at
http://kafka-python.readthedocs.io/en/1.0.0/usage.html. But this slows down
the producing speed. Another solution, which I end up with, is to
significantly increase max_in_flight_requests_per_connection, e.g., to 1024.



On Thu, Apr 28, 2016 at 10:31 AM, Bo Xu <box...@gmail.com> wrote:

> PS: The message dropping occurred intermittently, not all at the end. For
> example, it is the 10th, 15th, 18th messages that are missing. It it were
> all at the end, it would be understandable because I'm not using flush() to
> force transmitting.
>
> Bo
>
>
> On Thu, Apr 28, 2016 at 10:15 AM, Bo Xu <box...@gmail.com> wrote:
>
>> I set up a simple Kafka configuration, with one topic and one partition.
>> I have a Python producer to continuously publish messages to the Kafka
>> server and a Python consumer to receive messages from the server. Each
>> message is about 10K bytes, far smaller than
>> socket.request.max.bytes=104857600. What I found is that the consumer
>> intermittently missed some messages. I checked the server log file and
>> found that these messages are missing there as well. So it looks like these
>> message were never stored by the server. I also made sure that the producer
>> did not receive any error for every message that it published (using
>> send()).
>>
>> Any clues what could have caused the problem?
>>
>> Thanks,
>> Bo
>>
>
>

Reply via email to