Hey Rajiv, Are you using snappy compression?
On Tue, Dec 15, 2015 at 12:52 PM, Rajiv Kurian <ra...@signalfx.com> wrote: > We had to revert to 0.8.3 because three of our topics seem to have gotten > corrupted during the upgrade. As soon as we did the upgrade producers to > the three topics I mentioned stopped being able to do writes. The clients > complained (occasionally) about leader not found exceptions. We restarted > our clients and brokers but that didn't seem to help. Actually even after > reverting to 0.8.3 these three topics were broken. To fix it we had to stop > all clients, delete the topics, create them again and then restart the > clients. > > I realize this is not a lot of info. I couldn't wait to get more debug info > because the cluster was actually being used. Has any one run into something > like this? Are there any known issues with old consumers/producers. The > topics that got busted had clients writing to them using the old Java > wrapper over the Scala producer. > > Here are the steps I took to upgrade. > > For each broker: > > 1. Stop the broker. > 2. Restart with the 0.9 broker running with > inter.broker.protocol.version=0.8.2.X > 3. Wait for under replicated partitions to go down to 0. > 4. Go to step 1. > Once all the brokers were running the 0.9 code with > inter.broker.protocol.version=0.8.2.X we restarted them one by one with > inter.broker.protocol.version=0.9.0.0 > > When reverting I did the following. > > For each broker. > > 1. Stop the broker. > 2. Restart with the 0.9 broker running with > inter.broker.protocol.version=0.8.2.X > 3. Wait for under replicated partitions to go down to 0. > 4. Go to step 1. > > Once all the brokers were running 0.9 code with > inter.broker.protocol.version=0.8.2.X I restarted them one by one with the > 0.8.2.3 broker code. This however like I mentioned did not fix the three > broken topics. > > > On Mon, Dec 14, 2015 at 3:13 PM, Rajiv Kurian <ra...@signalfx.com> wrote: > > > Now that it has been a bit longer, the spikes I was seeing are gone but > > the CPU and network in/out on the three brokers that were showing the > > spikes are still much higher than before the upgrade. Their CPUs have > > increased from around 1-2% to 12-20%. The network in on the same brokers > > has gone up from under 2 Mb/sec to 19-33 Mb/sec. The network out has gone > > up from under 2 Mb/sec to 29-42 Mb/sec. I don't see a corresponding > > increase in kafka messages in per second or kafka bytes in per second JMX > > metrics. > > > > Thanks, > > Rajiv > > >