If a broker crashes and restarts, it will catch up the missing data from the leader replicas. Normally, when this broker is catching up, it won't be serving any client requests though. Are you seeing those errors on the crashed broker? Also, you are not supposed to see OffsetOutOfRangeException with just one broker failure with 3 replicas. Do you see the following in the controller log?
"No broker in ISR is alive for ... There's potential data loss." Thanks, Jun On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann <zecmerqu...@gmail.com>wrote: > Hi all, > > We have a cluster of 3 0.8 brokers, and this morning one of the broker > crashed. > It is a test broker, and we stored the logs in /tmp/kafka-logs. All topics > in use are replicated on the three brokers. > > You can guess the problem, when the broker rebooted it wiped all the data > in the logs. > > The producers and consumers are fine, but the broker with the wiped data > keeps generating a lot of exceptions, and I don't really know what to do to > recover. > > Example exception: > > [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing fetch > request for partition [topic,0] offset 814798 from consumer with > correlation id 0 (kafka.server.KafkaApis) > kafka.common.OffsetOutOfRangeException: Request for offset 814798 but we > only have log segments in the range 0 to 19372. > > There are a lot of them, something like 10+ per second. I (maybe wrongly) > assumed that the broker would catch up, if that's the case how can I see > the progress ? > > In general, what is the recommended way to bring back a broker with wiped > data in a cluster ? > > Thanks. >