Hi, this is the output of list topic:
topic: clicks partition: 0 leader: 1 replicas: 1 isr: 1 topic: clicks partition: 1 leader: 3 replicas: 3 isr: 3 topic: clicks partition: 2 leader: 1 replicas: 1 isr: 1 topic: visits partition: 0 leader: 3 replicas: 3 isr: 3 topic: visits partition: 1 leader: 2 replicas: 2 isr: 2 topic: visits partition: 2 leader: 3 replicas: 3 isr: 3 topic: stats.live.test partition: 0 leader: 3 replicas: 3,1,2 isr: 3,2,1 topic: stats.live.test partition: 1 leader: 2 replicas: 1,2,3 isr: 2,3,1 topic: stats.live.test partition: 2 leader: 2 replicas: 2,3,1 isr: 2,3,1 The topic causing problems is "clicks", and the partitions requested on the crashed broker are 0 and 2. Given the output of list topic, this means that those 2 partitions are permanently lost right now, right ? I thought all partitions were replicated, just like for the topic 'stats.live.test', but apparently I screwed up when creating the topics, I should have check that first. Thanks for your help. 2014/1/6 Jun Rao <jun...@gmail.com> > How many replicas do you have on that topic? What's the output of list > topic? > > Thanks, > > Jun > > > On Mon, Jan 6, 2014 at 1:45 AM, Vincent Rischmann <vinc...@rischmann.fr > >wrote: > > > Hi, > > > > yes, I'm seeing the errors on the crashed broker. > > > > My controller.log file only contains the following: > > > > [2014-01-03 09:41:01,794] INFO [ControllerEpochListener on 1]: > Initialized > > controller epoch to 11 and zk version 10 > > (kafka.controller.ControllerEpochListener) > > [2014-01-03 09:41:01,812] INFO [Controller 1]: Controller starting up > > (kafka.controller.KafkaController) > > [2014-01-03 09:41:02,082] INFO [Controller 1]: Controller startup > complete > > (kafka.controller.KafkaController) > > > > Since friday, nothing has changed and the broker generated multiples > > gigabytes of traces in server.log, one of the last exception looks like > > this: > > > > Request for offset 787449 but we only have log segments in the range 0 to > > 163110. > > > > The range has increased since friday (it was "0 to 19372"), does this > mean > > the broker is actually catching up ? > > > > > > Thanks for your help. > > > > > > > > > > 2014/1/3 Jun Rao <jun...@gmail.com> > > > > > If a broker crashes and restarts, it will catch up the missing data > from > > > the leader replicas. Normally, when this broker is catching up, it > won't > > be > > > serving any client requests though. Are you seeing those errors on the > > > crashed broker? Also, you are not supposed to see > > OffsetOutOfRangeException > > > with just one broker failure with 3 replicas. Do you see the following > in > > > the controller log? > > > > > > "No broker in ISR is alive for ... There's potential data loss." > > > > > > Thanks, > > > > > > Jun > > > > > > On Fri, Jan 3, 2014 at 1:23 AM, Vincent Rischmann < > zecmerqu...@gmail.com > > > >wrote: > > > > > > > Hi all, > > > > > > > > We have a cluster of 3 0.8 brokers, and this morning one of the > broker > > > > crashed. > > > > It is a test broker, and we stored the logs in /tmp/kafka-logs. All > > > topics > > > > in use are replicated on the three brokers. > > > > > > > > You can guess the problem, when the broker rebooted it wiped all the > > data > > > > in the logs. > > > > > > > > The producers and consumers are fine, but the broker with the wiped > > data > > > > keeps generating a lot of exceptions, and I don't really know what to > > do > > > to > > > > recover. > > > > > > > > Example exception: > > > > > > > > [2014-01-03 10:09:47,755] ERROR [KafkaApi-1] Error when processing > > fetch > > > > request for partition [topic,0] offset 814798 from consumer with > > > > correlation id 0 (kafka.server.KafkaApis) > > > > kafka.common.OffsetOutOfRangeException: Request for offset 814798 but > > we > > > > only have log segments in the range 0 to 19372. > > > > > > > > There are a lot of them, something like 10+ per second. I (maybe > > wrongly) > > > > assumed that the broker would catch up, if that's the case how can I > > see > > > > the progress ? > > > > > > > > In general, what is the recommended way to bring back a broker with > > wiped > > > > data in a cluster ? > > > > > > > > Thanks. > > > > > > > > > >