I cleared out the DB directories so the cluster is empty and no messages are being sent or received.
On 21 September 2017 at 16:44, John Yost <hokiege...@gmail.com> wrote: > The only thing I can think of is message format...do the client and broker > versions match? If the clients are a lower version than brokers (i.e., > 0.9.0.1 client, 0.10.0.1 broker), then I think there could be message > format conversions both for incoming messages as well as for replication. > > --John > > On Thu, Sep 21, 2017 at 10:42 AM, Elliot Crosby-McCullough < > elliot.crosby-mccullo...@freeagent.com> wrote: > > > Nothing, that value (that group of values) was at default when we started > > the debugging. > > > > On 21 September 2017 at 15:08, Ismael Juma <ism...@juma.me.uk> wrote: > > > > > Thanks. What happens if you reduce num.replica.fetchers? > > > > > > On Thu, Sep 21, 2017 at 3:02 PM, Elliot Crosby-McCullough < > > > elliot.crosby-mccullo...@freeagent.com> wrote: > > > > > > > 551 partitions, broker configs are: > > > > https://gist.github.com/elliotcm/3a35f66377c2ef4020d76508f49f106b > > > > > > > > We tweaked it a bit from standard recently but that was as part of > the > > > > debugging process. > > > > > > > > After some more experimentation I'm seeing the same behaviour at > about > > > half > > > > the CPU after creating one 50 partition topic in an otherwise empty > > > > cluster. > > > > > > > > On 21 September 2017 at 14:20, Ismael Juma <ism...@juma.me.uk> > wrote: > > > > > > > > > A couple of questions: how many partitions in the cluster and what > > are > > > > your > > > > > broker configs? > > > > > > > > > > On Thu, Sep 21, 2017 at 1:58 PM, Elliot Crosby-McCullough < > > > > > elliot.crosby-mccullo...@freeagent.com> wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > We've been trying to debug an issue with our kafka cluster for > > > several > > > > > days > > > > > > now and we're close to out of options. > > > > > > > > > > > > We have 3 kafka brokers associated with 3 zookeeper nodes and 3 > > > > registry > > > > > > nodes, plus a few streams clients and a ruby producer. > > > > > > > > > > > > Two of the three brokers are pinning a core and have been for > days, > > > no > > > > > > amount of restarting, debugging, or clearing out of data seems to > > > help. > > > > > > > > > > > > We've got the logs at DEBUG level which shows a constant flow > much > > > like > > > > > > this: https://gist.github.com/elliotcm/ > > > e66a1ca838558664bab0c91549acb2 > > > > 51 > > > > > > > > > > > > As best as we can tell the brokers are up to date on replication > > and > > > > the > > > > > > leaders are well-balanced. The cluster is receiving no traffic; > no > > > > > > messages are being sent in and the consumers/streams are shut > down. > > > > > > > > > > > > From our profiling of the JVM it looks like the CPU is mostly > > working > > > > in > > > > > > replication threads and SSL traffic (it's a secured cluster) but > > that > > > > > > shouldn't be treated as gospel. > > > > > > > > > > > > Any advice would be greatly appreciated. > > > > > > > > > > > > All the best, > > > > > > Elliot > > > > > > > > > > > > > > > > > > > > >