Ok,

I didn't realize the write to disk was immediate (is that new in 0.8, with
requested acks enabled?).

I do think the OS will indeed reserve space in advance for data not yet
flushed to disk.  This seems to be true, at least, for xfs, which I have
more experience lately.

Jason


On Thu, Aug 15, 2013 at 11:30 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> I am saying we always immediately write to the fs. So the question is is it
> possible with delayed allocation in ext4 to do a successful write that
> later cannot be flushed to disk due to running out of space? I don't know
> the answer to this, though I would hope it is not possible.
>
> Basically if our write to the fs succeeds and replicas acknowledge then we
> send back the ack.
>
> -Jay
>
>
> On Thu, Aug 15, 2013 at 11:12 AM, Jason Rosenberg <j...@squareup.com>
> wrote:
>
> > Hmmm....I guess I was thinking that a broker could receive a message and
> > keep it in memory, before having disk space reserved for it's eventual
> > storage.  Are you saying that memory is not allocated for a message
> without
> > there already being disk space allocated for it?  In which case, there
> > should be no problem!
> >
> > Jason
> >
> >
> > On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > I don't think the filesystem will overcommit its disk space, but I'm
> > > actually not sure. I think this would only come into play on a fs like
> > ext4
> > > which does lazy block allocation in addition to lazy writing. But I
> think
> > > even ext4 is probably not allowed to hand out more disk space then it
> > has.
> > >
> > >
> > > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <j...@squareup.com>
> > > wrote:
> > >
> > > > A related question:  Will producers sending messages with
> > acknowledgment,
> > > > get a failed ack if a broker is out of disk space, or will messages
> get
> > > > buffered in memory successfully (resulting in a good ack, before
> > failing
> > > to
> > > > be written).
> > > >
> > > > It seems like it might be a good feature to have the broker
> auto-detect
> > > if
> > > > it's log dir is nearing full, so that there is some runway to
> > gracefully
> > > > shutdown, while still writing any in memory buffered messages.  It
> > could
> > > be
> > > > an optional threshold, like 98% full, or X Mb free, etc.
> > > >
> > > > Jason
> > > >
> > > >
> > > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <jay.kr...@gmail.com>
> > wrote:
> > > >
> > > > > The crash is actually just a call to shutdown. We think this is the
> > > right
> > > > > thing to do, though I agree it is unintuitive. Here is why. When
> you
> > > get
> > > > an
> > > > > out of space error it is likely that the operating system did a
> > partial
> > > > > write, leaving you with a corrupt log. Furthermore it is possible
> > that
> > > > > space will free up at which point more writes on the log could
> > succeed
> > > so
> > > > > you wouldn't even know there was a problem but all your consumers
> > would
> > > > hit
> > > > > this data and choke.
> > > > >
> > > > > By "crashing" the node we ensure that recovery is run on the log to
> > > bring
> > > > > it into a consistent state.
> > > > >
> > > > > Theoretically we could leave the node up accepting reads but
> > rejecting
> > > > > writes while attempting to recover the log. But there are a bunch
> of
> > > > > problems with this. But this is very complex. Likely if you are out
> > of
> > > > > space you are just going to keep getting writes, and running out of
> > > space
> > > > > again and then running recovery and so on. This kind of crazy loop
> is
> > > > much
> > > > > worse then just needing to bring the node back up.
> > > > >
> > > > > Alternately we could leave the node up but go into some kind of
> > > > > write-rejecting mode forever. But this would still require that you
> > > > restart
> > > > > the node, and we would have to implement that write-rejecting node.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > -Jay
> > > > >
> > > > >
> > > > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <bjb...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This is more of a thought question than a problem that I need
> > support
> > > > > for.
> > > > > > I have trying out Kafka 0.8.0-beta1 with replication. For our
> user
> > > case
> > > > > we
> > > > > > want to try and guarantee that our consumers will see all
> messages
> > > even
> > > > > if
> > > > > > they have fallen greatly behind the broker/producer. For this
> > reason
> > > I
> > > > > > wanted to know how the broker would react when the filesystem it
> > > writes
> > > > > its
> > > > > > messages to is full. What I found was that the broker crashes and
> > > > cannot
> > > > > be
> > > > > > started until the filesystem has space again.
> > > > > >
> > > > > > Is there or would it make sense to provide configuration allowing
> > the
> > > > > > broker to reject writes in this case rather than crashing,
> > electing a
> > > > new
> > > > > > leader and attempting the write again? I can clearly understand
> the
> > > use
> > > > > > case that we don't want to 'lose' messages from the producer and
> I
> > > > could
> > > > > > also see how lack of filesystem space could be considered a
> machine
> > > > > > failure, but with replication I would think if you are running
> out
> > of
> > > > > space
> > > > > > on 1 broker you are likely running out of space on others.
> > > > > >
> > > > > > Bryan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to