I think I know what is the problem now. With ack=1 the server will respond
to the producer once the messages are appended to the log, BUT appending to
the log does not guarantee the log is flushed to disk, it may still be in
the page cache of the broker. Now if broker fails and recovers, messages
not flushed yet will be lost although the producer has got the ACK from the
broker.

With ack=-1 the server will not respond to the producer until all of the
replicas have gotten this message in their logs, again not necessarily
flushed to disk. So with all brokers down there is still a chance to lose
data, but smaller compared with ack=1.

Guozhang


On Sat, Jun 7, 2014 at 6:57 PM, Libo Yu <yu_l...@hotmail.com> wrote:

> Yes. I can hardly believe there is message loss in such a case so I've
> checked
> my test code very carefully. Unfortunately, I cannot provide any log
> because
> of the company policy. This is a big risk for our operation as all our
> servers
> at one data center must be rebooted regularly at the same time by system
> admins.
> So rolling start is not an option for us.
>
> > Date: Sat, 7 Jun 2014 16:02:35 -0700
> > Subject: Re: question about synchronous producer
> > From: wangg...@gmail.com
> > To: users@kafka.apache.org
> >
> > I see. So previously you run the test as ack=1?
> >
> > Guozhang
> >
> >
> > On Sat, Jun 7, 2014 at 7:24 AM, Libo Yu <yu_l...@hotmail.com> wrote:
> >
> > > Hi Guozhang,
> > >
> > > The issue is not constantly to reproduce but fairly easy to reproduce.
> > > It seems there is some kind of failure that has not been captured by
> the
> > > Kafka code and no exception has been thrown.
> > >
> > > I did a new test with request.required.acks set to -1. The number of
> lost
> > > message dropped significantly but there were still message loss.
> > >
> > > > Date: Fri, 6 Jun 2014 08:12:17 -0700
> > > > Subject: Re: question about synchronous producer
> > > > From: wangg...@gmail.com
> > > > To: users@kafka.apache.org
> > > >
> > > > Libo,
> > > >
> > > > I have double checked the code. With sync producers all failures
> should
> > > be
> > > > either thrown as exceptions or logged in warning/error log entries.
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Thu, Jun 5, 2014 at 6:38 PM, Libo Yu <yu_l...@hotmail.com> wrote:
> > > >
> > > > > Not really. The issue was reported by a client. I added a lot of
> > > logging
> > > > > to make sure no exception was thrown
> > > > > from send() when the message was lost. It is not hard to reproduce.
> > > This
> > > > > is a critical issue for operation. It may
> > > > > not be possible for brokers and producers to be restarted at the
> same
> > > time.
> > > > >
> > > > > > Date: Thu, 5 Jun 2014 16:53:29 -0700
> > > > > > Subject: Re: question about synchronous producer
> > > > > > From: wangg...@gmail.com
> > > > > > To: users@kafka.apache.org
> > > > > >
> > > > > > Libo, did you see any exception/error entries on the producer
> log?
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 5, 2014 at 10:33 AM, Libo Yu <yu_l...@hotmail.com>
> > > wrote:
> > > > > >
> > > > > > > Yes. I used three sync producers with request.required.acks=1.
> I
> > > let
> > > > > them
> > > > > > > publish 2k short messages and in the process I restart all
> > > zookeeper
> > > > > and
> > > > > > > kafka processes ( 3 hosts in a cluster). Normally there will be
> > > message
> > > > > > > loss after 3 restarts. After 3 restarts, I use a consumer to
> > > retrieve
> > > > > the
> > > > > > > messages and do the verification.
> > > > > > >
> > > > > > > > Date: Thu, 5 Jun 2014 10:15:18 -0700
> > > > > > > > Subject: Re: question about synchronous producer
> > > > > > > > From: wangg...@gmail.com
> > > > > > > > To: users@kafka.apache.org
> > > > > > > >
> > > > > > > > Libo,
> > > > > > > >
> > > > > > > > For clarification, you can use sync producer to reproduce
> this
> > > issue?
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 5, 2014 at 10:03 AM, Libo Yu <
> yu_l...@hotmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > When all the  brokers are down the producer should retry
> for a
> > > few
> > > > > > > times
> > > > > > > > > and throw FailedToSendMessageException. And user code can
> > > catch the
> > > > > > > > > exception and retry after a backoff. However, in my tests,
> no
> > > > > > > exception was
> > > > > > > > > caught and the message was lost silently. My broker is
> 0.8.1.1
> > > and
> > > > > my
> > > > > > > > > client is 0.8.0. It is fairly easy to reproduce. Any
> insight on
> > > > > this
> > > > > > > issue?
> > > > > > > > >
> > > > > > > > > Libo
> > > > > > > > >
> > > > > > > > > > Date: Thu, 5 Jun 2014 09:05:27 -0700
> > > > > > > > > > Subject: Re: question about synchronous producer
> > > > > > > > > > From: wangg...@gmail.com
> > > > > > > > > > To: users@kafka.apache.org
> > > > > > > > > >
> > > > > > > > > > When the producer exhausted all the retries it will drop
> the
> > > > > message
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > floor. So when the broker is down for too long there
> will be
> > > data
> > > > > > > loss.
> > > > > > > > > >
> > > > > > > > > > Guozhang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 5, 2014 at 6:20 AM, Libo Yu <
> yu_l...@hotmail.com
> > > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I want to know why there will be message loss when
> brokers
> > > are
> > > > > > > down for
> > > > > > > > > > > too long.
> > > > > > > > > > > I've noticed message loss when brokers are restarted
> during
> > > > > > > > > publishing. It
> > > > > > > > > > > is a sync producer with request.required.acks set to 1.
> > > > > > > > > > >
> > > > > > > > > > > Libo
> > > > > > > > > > >
> > > > > > > > > > > > Date: Thu, 29 May 2014 20:11:48 -0700
> > > > > > > > > > > > Subject: Re: question about synchronous producer
> > > > > > > > > > > > From: wangg...@gmail.com
> > > > > > > > > > > > To: users@kafka.apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > Libo,
> > > > > > > > > > > >
> > > > > > > > > > > > That is correct. You may want to increase the
> > > > > retry.backoff.ms
> > > > > > > in
> > > > > > > > > this
> > > > > > > > > > > > case. In practice, if the brokers are down for too
> long,
> > > then
> > > > > > > data
> > > > > > > > > loss
> > > > > > > > > > > is
> > > > > > > > > > > > usually inevitable.
> > > > > > > > > > > >
> > > > > > > > > > > > Guozhang
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, May 29, 2014 at 2:55 PM, Libo Yu <
> > > > > yu_l...@hotmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi team,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Assume I am using a synchronous producer and it
> has the
> > > > > > > following
> > > > > > > > > > > default
> > > > > > > > > > > > > properties:
> > > > > > > > > > > > >
> > > > > > > > > > > > > message.send.max.retries
> > > > > > > > > > > > >       3
> > > > > > > > > > > > > retry.backoff.ms
> > > > > > > > > > > > >       100
> > > > > > > > > > > > >
> > > > > > > > > > > > > I use java api Producer.send(message) to send a
> > > message.
> > > > > > > > > > > > > While send() is being called, if the brokers are
> > > shutdown,
> > > > > what
> > > > > > > > > > > happens?
> > > > > > > > > > > > > send() will retry 3 times with a 100ms interval and
> > > fail
> > > > > > > silently?
> > > > > > > > > > > > > If I don't want to lose any message when the
> brokers
> > > are
> > > > > back
> > > > > > > > > online,
> > > > > > > > > > > what
> > > > > > > > > > > > > should I do? Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Libo
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > -- Guozhang
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -- Guozhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
>
>



-- 
-- Guozhang

Reply via email to