[ 
https://issues.apache.org/jira/browse/KAFKA-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-2891:
----------------------------------
    Comment: was deleted

(was: [~geoffra] Replication tests expect all ack'ed messages to be received 
even though it runs with the default min.insync.replicas=1. The tests kills the 
leader of a partition in a loop while messages are being produced and consumed. 
This can (and does) result in ISRs dropping down to 1 (just the leader is the 
ISR list). Messages published when there are no other replicas are lost if the 
leader (the only ISR) is killed. It seems to me that the test's expectations 
are too high. When I modify the test (hard_bounce with SSL/SASL) to wait until 
there are atleast two entries in the ISR list before killing the leader, it 
passes reliably in my local test runs. I wonder if the only reason this test 
has been working is because PLAINTEXT consumers keep up with the producer and 
hence are unlikely to lose messages. Would it be a reasonable change to the 
test to ensure that there are at least two ISRs before killing the leader?)

> Gaps in messages delivered by new consumer after Kafka restart
> --------------------------------------------------------------
>
>                 Key: KAFKA-2891
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2891
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.9.0.0
>            Reporter: Rajini Sivaram
>            Priority: Critical
>
> Replication tests when run with the new consumer with SSL/SASL were failing 
> very often because messages were not being consumed from some topics after a 
> Kafka restart. The fix in KAFKA-2877 has made this a lot better. But I am 
> still seeing some failures (less often now) because a small set of messages 
> are not received after Kafka restart. This failure looks slightly different 
> from the one before the fix for KAFKA-2877 was applied, hence the new defect. 
> The test fails because not all acked messages are received by the consumer, 
> and the number of messages missing are quite small.
> [~benstopford] Are the upgrade tests working reliably with KAFKA-2877 now?
> Not sure if any of these log entries are important:
> {quote}
> [2015-11-25 14:41:12,342] INFO SyncGroup for group test-consumer-group failed 
> due to NOT_COORDINATOR_FOR_GROUP, will find new coordinator and rejoin 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-25 14:41:12,342] INFO Marking the coordinator 2147483644 dead. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-25 14:41:12,958] INFO Attempt to join group test-consumer-group 
> failed due to unknown member id, resetting and retrying. 
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)
> [2015-11-25 14:41:42,437] INFO Fetch offset null is out of range, resetting 
> offset (org.apache.kafka.clients.consumer.internals.Fetcher)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to