After an hour: it briefly popped up with 1 instance 'applied' to all 10 partitions... then it went back to rebalance for 10-15 minutes.. followed by a different instance on all partitions.. and then more rebalancing..
At no point (yet) have I seen the work get truly 'balanced' between all 5 instances. On Sun, Dec 11, 2016 at 6:04 PM, Jon Yeargers <jon.yearg...@cedexis.com> wrote: > I changed 'num.standby.replicas' to '2'. > > I started one instance and it immediately showed up in the > 'kafka-consumer-groups .. --describe' listing. > > So I started a second... and it quickly displaced the first... which never > came back. > > Started a third.. same effect. Second goes away never to return.. but now > it's tries to rebalance for a while before I see the third by itself. > > Fourth and fifth - now it's gone off to rebalance (and is seemingly stuck > there) and hasn't pulled any data for more than an hour. > > > > On Sun, Dec 11, 2016 at 2:27 PM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> No sure. >> >> How big is your state? On rebalance, state stores might move from one >> machine to another. To recreate the store on the new machine the >> underlying changelog topic must be read. This can take some time -- an >> hour seems quite long though... >> >> To avoid long state recreation periods Kafka Streams support standby >> task. Try to enable those via StreamsConfig: "num.standby.replicas" >> >> http://docs.confluent.io/current/streams/developer-guide. >> html#optional-configuration-parameters >> >> Also check out this section of the docs: >> >> http://docs.confluent.io/3.1.1/streams/architecture.html#fault-tolerance >> >> >> -Matthias >> >> >> On 12/11/16 3:14 AM, Gerrit Jansen van Vuuren wrote: >> > I don't know about speeding up rebalancing, and an hour seems to suggest >> > something is wrong with zookeeper or you're whole setup maybe. if it >> > becomes an unsolvable issue for you, you could try >> > https://github.com/gerritjvv/kafka-fast which uses a different model >> and >> > doesn't need balancing or rebalancing. >> > >> > disclojure: "Im the library author". >> > >> > >> > >> > On 11 Dec 2016 11:56 a.m., "Jon Yeargers" <jon.yearg...@cedexis.com> >> wrote: >> > >> > Is there some way to 'help it along'? It's taking an hour or more from >> when >> > I start my app to actually seeing anything consumed. >> > >> > Plenty of CPU (and IOWait) during this time so I know it's doing >> > _something_... >> > >> >> >