Hi Dave,

Dory sounds very exciting. Without persistence its less useful for clients
connected over a WAN, since if the WAN goes wonky you could build up quite
a queue until it comes back.

-J

On Mon, Jun 13, 2016 at 3:00 AM, Dave Peterson <d...@dspeterson.com> wrote:

> Hi Arya,
>
> In the case of a kernel panic or other failure serious enough to bring
> a machine down (hardware failure, power outage, etc.), message loss is
> definitely possible, and not totally preventable regardless of what
> precautions software takes to avoid data loss.  To ensure high
> throughput, Dory batches messages together before forwarding them to
> Kafka.  The amount of batching Dory performs is configurable on a
> per-topic basis, and in practice will typically cause a delay of
> between several hundred milliseconds and several seconds before
> messages are forwarded to Kafka.  Thus, if a machine on which Dory is
> running crashes, you can expect to lose up to a few seconds of batched
> messages depending on the batching configuration.  Additionally, you
> may lose some sent messages for which Dory has not yet received an ACK
> from Kafka indicating successful receipt and persistence to disk.
>
> To narrow the window of possible message loss, one can set a very
> short batching delay, or even completely disable batching for one or
> more topics.  However this is not recommended since it will have a
> serious adverse affect on the performance of your Kafka cluster.
> Another possibility would be for Dory to immediately persist messages
> to local disk on receipt, even before they are batched, and then
> delete them only after a non-error ACK has been received from Kafka.
> This would greatly narrow, although not totally eliminate the window
> for message loss.  However it would add substantial complexity to
> Dory, and impose a substantial disk I/O performance cost.  An
> important consideration is that when you write to a file, the kernel's
> default action is to not immediately write the data to disk.  It
> buffers the data in its page cache for a period of time (typically up
> to 30 seconds) to improve throughput.  If the machine crashes, all of
> this unwritten data is lost.  You can get around this by calling
> fsync() on a file descriptor or specifying O_DIRECT | O_SYNC when
> opening a file, but this will degrade I/O performance.  Dory avoids
> persisting data to disk with the view that these downsides outweigh
> the benefits of avoiding up to a few seconds of data loss when a
> machine crashes.  However a feature I intend to add (described later
> in this email) will provide useful information about which messages
> were successfully delivered to Kafka in the event of a machine crash.
>
> Dory takes great care to avoid message loss due to circumstances
> other than machine crashes.  When Dory starts up, it preallocates a
> fixed amount of memory (configurable with the --msg_buffer_max N
> command line option) for storing messages received from clients.
> Dory holds on to each message it receives until one of the following
> has occurred:
>
>     - It receives an ACK from Kafka indicating that the message has
>       been successfully received and persisted to disk.
>
>     - It receives an error ACK from Kafka that causes the message to
>       be immediately discarded.  Certain error ACKs cause Dory to
>       immediately discard messages, and others cause it to pause,
>       update its metadata, and attempt redelivery based on the new
>       metadata.  ACK error 2 (invalid message, i.e. CRC error) causes
>       Dory to immediately attempt redelivery without updating its
>       metadata.  The actions Dory takes for various error ACK values
>       are documented here:
>
>
> https://github.com/dspeterson/dory/blob/master/doc/design.md#dispatcher
>
>     - As mentioned above, certain error ACKs will cause Dory to
>       attempt redelivery.  Each time this occurs for a given message,
>       Dory increments a failed delivery attempt count on the message.
>       Once the count exceeds a certain value (specified by the
>       --max_failed_delivery_attempts N command line option), Dory
>       discards the message.
>
>     - It has determined that the message is undeliverable due to a
>       problem such as the message being malformed (see
>
> https://github.com/dspeterson/dory/blob/master/doc/sending_messages.md
>       which documents Dory's input message format) or having an
>       invalid topic.  In this case, the message is discarded.
>
>     - It has exhausted its preallocated memory for storing messages.
>       This may occur in the case where a network-related outage or
>       serious problem with the Kafka cluster persists for an extended
>       period of time, preventing normal message delivery.  In this
>       case, new messages received from clients are discarded.
>
>     - Dory receives a shutdown signal (SIGTERM or SIGINT).  In this
>       case, Dory immediately stops accepting new messages from
>       clients, and for a certain configurable period of time attempts
>       to deliver any pending messages.  Once the time limit expires,
>       any remaining messages are lost.  This type of message loss
>       can be avoided by not shutting Dory down until no more clients
>       are sending it messages, and Dory has been given enough time to
>       process all pending messages.
>
> A critical aspect of Dory's behavior is that every discard is tracked
> along with the reason why the discard occurred.  Dory provides a web
> interface from which periodic discard reports and other status
> information can be obtained.  Thus, in the rare event when discards
> occur, they can be tracked and recorded on a per-topic basis.  The
> intent is for monitoring infrastructure to periodically ask Dory for
> a discard report, alert an administrator if message loss occurred, and
> optionally store the discard reports in a database so that a queryable
> history of data quality is available.
>
> A relevant feature I intend to add at some point is described here:
>
>     http://dory.wikidot.com/wiki:commit-points
>
> This notion of a commit point can be useful in the event of a machine
> crash.  If monitoring infrastructure periodically querys Dory for
> commit point information, then in the event of a machine crash, we
> know that any resulting message loss is limited to messages with
> timestamps > T, where T is the latest commit point.
>
> I hope this helps answer your questions, and that I haven't inundated
> you with too much information.  Let me know if you have more
> questions.
>
>
> Regards,
> Dave
>
>
> -----Original Message-----
> From: "Arya Ketan" <ketan.a...@gmail.com>
> Sent: Sunday, June 12, 2016 11:52am
> To: users@kafka.apache.org
> Subject: Re: Introducing Dory
>
> Hi Dave,
> Dory looks pretty interesting. I had a few  further questions on it
> a) How does Dory handle kernel panics?
> b) What kind of message guarantees does dory provide and also if you can
> share some design decisions taken to enable the guarantees whatever they
> are.
>
> Thanks
> Arya
>
> Arya
>
> On Sun, Jun 12, 2016 at 8:10 PM, Dave Peterson <d...@dspeterson.com>
> wrote:
>
> > Thanks!  Enjoy :-)
> >
> >
> >
> > On 6/12/2016 12:24 AM, Gwen Shapira wrote:
> >
> >> Dory is pretty cool (even though it is named after a somewhat dorky
> >> fish). Thank you for sharing :)
> >>
> >> On Sun, Jun 12, 2016 at 1:24 AM, Dave Peterson <d...@dspeterson.com>
> >> wrote:
> >>
> >>> Hello Kafka users,
> >>>
> >>> Version 1.1.0 of Dory is now available.  See
> >>> https://github.com/dspeterson/dory for details.  Dory is the successor
> >>> to Bruce (https://github.com/tagged/bruce), a Kafka producer daemon I
> >>> created while working at if(we) (http://www.ifwe.co/).  The code has
> >>> seen a number of improvements since its initial release in September
> >>> 2014.  The list of example clients for various programming languages
> >>> has also been extended.  Dory maintains full backward compatibility
> >>> with Bruce, so existing users can easily switch.
> >>>
> >>> The latest release adds support for receiving messages from clients by
> >>> UNIX domain stream socket or local TCP.  Although UNIX domain
> >>> datagrams are still the preferred means of sending messages in most
> >>> cases, the option of using stream sockets facilitates sending messages
> >>> too large to fit in a single datagram.  The local TCP option
> >>> facilitates adding support for clients written in programming
> >>> languages that do not provide easy access to UNIX domain sockets.
> >>>
> >>> Dory's wiki page http://dory.wikidot.com/start contains a list of
> >>> ideas for additional features and other improvements.  Community
> >>> feedback is welcomed and appreciated.  If you have ideas for things
> >>> you would like to see in future releases, please add them to the list.
> >>> Also, please contribute code if you can afford the time.
> >>>
> >>>
> >>> Thanks,
> >>> Dave Peterson
> >>>
> >>>
> >>>
> >
>
>
>

Reply via email to