I believe I've had the same problem on the 0.8.2 rc2. We had a idle test
cluster with unknown health status and I applied rc3 without checking if
everything was ok before. Since that cluster had been doing nothing for a
couple of days and the retention time was 48 hours it's reasonable to
assume that no actual data was left on the cluster. The same type of logs
was emitted in big amounts and never stopped. I then rebooted each
zookeeper in series. No change, Then bumped each broker - no change,
Finally I took down all brokers at the same time.

The logging stopped but then one broker did not have any partitions in
sync, including the the internal consumer offset topic that was living
(with replicas=1) on that broker. I then bumped this broker once more and
then my whole cluster became in sync.

I suspect that something related to 0 size topics caused this since the the
cluster worked fine the week before during testing and also after during
more testing with rc3.







2015-02-05 19:22 GMT+01:00 Kyle Banker <kyleban...@gmail.com>:

> Digging in a bit more, it appears that the "down" broker had likely
> partially failed. Thus, it was still attempting to fetch offsets that no
> longer exists. Does this make sense as an explanation of the
> above-mentioned behavior?
>
> On Thu, Feb 5, 2015 at 10:58 AM, Kyle Banker <kyleban...@gmail.com> wrote:
>
> > Dug into this a bit more, and it turns out that we lost one of our 9
> > brokers at the exact moment when this started happening. At the time that
> > we lost the broker, we had no under-replicated partitions. Since the
> broker
> > disappeared, we've had a fairly constant number of under replicated
> > partitions. This makes some sense, of course.
> >
> > Still, the log message doesn't.
> >
> > On Thu, Feb 5, 2015 at 10:39 AM, Kyle Banker <kyleban...@gmail.com>
> wrote:
> >
> >> I have a 9-node Kafka cluster, and all of the brokers just started
> >> spouting the following error:
> >>
> >> ERROR [Replica Manager on Broker 1]: Error when processing fetch request
> >> for partition [mytopic,57] offset 0 from follower with correlation id
> >> 58166. Possible cause: Request for offset 0 but we only have log
> segments
> >> in the range 39 to 39. (kafka.server.ReplicaManager)
> >>
> >> The "mytopic" topic has a replication factor of 3, and metrics are
> >> showing a large number of under replicated partitions.
> >>
> >> My assumption is that a log aged out but that the replicas weren't aware
> >> of it.
> >>
> >> In any case, this problem isn't fixing itself, and the volume of log
> >> messages of this type is enormous.
> >>
> >> What might have caused this? How does one resolve it?
> >>
> >
> >
>

Reply via email to