After we implement non-blocking IO for the producer, there may not be much incentive left to use ack = 0, but this is an interesting idea - not just for the controlled shutdown case, but also when leadership moves due to say, a broker's zk session expiring. Will have to think about it a bit more.
On Mon, Jun 24, 2013 at 12:22 AM, Jason Rosenberg <j...@squareup.com> wrote: > Yeah I am using ack = 0, so that makes sense. I'll need to rethink that, > it would seem. It would be nice, wouldn't it, in this case, for the broker > to realize this and just forward the messages to the correct leader. Would > that be possible? > > Also, it would be nice to have a second option to the controlled shutdown > (e.g. controlled.shutdown.quiescence.ms), to allow the broker to wait > after > the controlled shutdown, a prescribed amount of time before actually > shutting down the server. Then, I could set this value to something a > little greater than the producer's 'topic.metadata.refresh.interval.ms'. > This would help with hitless rolling restarts too. Currently, every > producer gets a very loud "Connection Reset" with a tall stack trace each > time I restart a broker. Would be nicer to have the producers still be > able to produce until the metadata refresh interval expires, then get the > word that the leader has moved due to the controlled shutdown, and then > start producing to the new leader, all before the shutting down server > actually shuts down. Does that seem feasible? > > Jason > > > On Sun, Jun 23, 2013 at 8:23 PM, Jun Rao <jun...@gmail.com> wrote: > > > Jason, > > > > Are you using ack = 0 in the producer? This mode doesn't work well with > > controlled shutdown (this is explained in FAQ i*n > > https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#)* > > * > > * > > Thanks, > > > > Jun > > > > > > On Sun, Jun 23, 2013 at 1:45 AM, Jason Rosenberg <j...@squareup.com> > wrote: > > > > > I'm working on trying on having seamless rolling restarts for my kafka > > > servers, running 0.8. I have it so that each server will be restarted > > > sequentially. Each server takes itself out of the load balancer (e.g. > > sets > > > a status that the lb will recognize, and then waits more than long > enough > > > for the lb to stop sending meta-data requests to that server). Then I > > > initiate the shutdown (with controlled.shutdown.enable=true). This > seems > > > to work well, however, I occasionally see warnings like this in the log > > > from the server, after restart: > > > > > > 2013-06-23 08:28:46,770 WARN [kafka-request-handler-2] > server.KafkaApis > > - > > > [KafkaApi-508818741] Produce request with correlation id 7136261 from > > > client on partition [mytopic,0] failed due to Leader not local for > > > partition [mytopic,0] on broker 508818741 > > > > > > This WARN seems to persistently repeat, until the producer client > > initiates > > > a new meta-data request (e.g. every 10 minutes, by default). However, > > the > > > producer doesn't log any errors/exceptions when the server is logging > > this > > > WARN. > > > > > > What's happening here? Is the message silently being forwarded on to > the > > > correct leader for the partition? Is the message dropped? Are these > > WARNS > > > particularly useful? > > > > > > Thanks, > > > > > > Jason > > > > > >