Hi Yogesh, Can you please clarify what you mean by "observing data loss"?
Ismael On Mon, Sep 18, 2017 at 5:08 PM, Yogesh Sangvikar < yogesh.sangvi...@gmail.com> wrote: > Hi Team, > > Please help to find resolution for below kafka rolling upgrade issue. > > Thanks, > > Yogesh > > On Monday, September 18, 2017 at 9:03:04 PM UTC+5:30, Yogesh Sangvikar > wrote: >> >> Hi Team, >> >> Currently, we are using confluent 3.0.0 kafka cluster in our production >> environment. And, we are planing to upgrade the kafka cluster for confluent >> 3.2.2 >> We are having topics with millions on records and data getting >> continuously published to those topics. And, also, we are using other >> confluent services like schema-registry, kafka connect and kafka rest to >> process the data. >> >> So, we can't afford downtime upgrade for the platform. >> >> We have tries rolling kafka upgrade as suggested on blogs in Development >> environment, >> >> https://docs.confluent.io/3.2.2/upgrade.html >> >> https://kafka.apache.org/documentation/#upgrade >> >> But, we are observing data loss on topics while doing rolling upgrade / >> restart of kafka servers for "inter.broker.protocol.version=0.10.2". >> >> As per our observation, we suspect the root cause for the data loss >> (explained for a topic partition having 3 replicas), >> >> - As the kafka broker protocol version updates from 0.10.0 to 0.10.2 >> in rolling fashion, the in-sync replicas having older version will not >> allow updated replicas (0.10.2) to be in sync unless are all updated. >> - Also, we have explicitly disabled "unclean.leader.election.enabled" >> property, so only in-sync replicas will be elected as leader for the given >> partition. >> - While doing rolling fashion update, as mentioned above, older >> version leader is not allowing newer version replicas to be in sync, so >> the >> data pushed using this older version leader, will not be synced with other >> replicas and if this leader(older version) goes down for an upgrade, >> other >> updated replicas will be shown in in-sync column and become leader, but >> they lag in offset with old version leader and shows the offset of the >> data >> till they have synced. >> - And, once the last replica comes up with updated version, will >> start syncing data from the current leader. >> >> >> Please let us know comments on our observation and suggest proper way for >> rolling kafka upgrade as we can't afford downtime. >> >> Thanks, >> Yogesh >> >