Hello Kafka users,

Meet Bruce, a producer daemon developed at Tagged, Inc.
(http://www.tagged.com).  We are open sourcing Bruce because we have
found it useful at Tagged, and believe others may also benefit from it.
Bruce is available on GitHub (https://github.com/tagged/bruce ).

We developed Bruce to function as a single intake point for a Kafka
cluster that serves diverse clients written in a variety of programming
languages.  Clients write messages to Bruce's UNIX domain datagram
socket using a simple binary format.  Once a client writes a message,
Bruce takes full responsibility for reliable delivery to the Kafka
cluster.  Communication between Bruce and clients is purely one-way.
After writing a message to Bruce's socket, there is no need for a client
to wait for an acknowledgement.  The operating system provides the same
reliability guarantee for UNIX domain sockets as for other local
interprocess communication mechanisms such as traditional UNIX pipes.
Example client code for writing messages to Bruce's socket is currently
available in C, C++, Java, Python, and PHP.  Community contributions for
other programming languages are welcome.

In addition to providing a simple uniform access point for clients,
Bruce has a web-based status monitoring and data quality reporting
interface.  Bruce deals with transient load spikes and Kafka-related
problems by buffering messages in memory up to a configurable limit,
until they are sent and successfully acknowledged by a Kafka broker.  If
serious enough problems occur that Bruce is forced to discard messages,
it tracks all discards and reports them through its web interface,
giving a breakdown of discards by topic, including counts of discarded
messages and windows of time in which they occurred.  Per-topic
information on messages queued to be sent or waiting for
acknowledgements from Kafka is also available through Bruce's web
interface.

Bruce comes with Nagios-based health monitoring and discard reporting
scripts, which are currently in use at Tagged to alert us if problems
occur. The discard monitoring script stores Bruce's discard reports in
an Oracle database so we have a complete, queryable history of data
quality information.  Bruce's web interface provides easy to parse JSON
output to facilitate integration with other monitoring infrastructure.

Bruce provides batching and compression that is configurable on a per-
topic basis.  Only Snappy compression is currently supported, but Bruce
was designed to support multiple compression types.

For more information, see Bruce's documentation which is available on
its GitHub site.


Cheers,

Dave Peterson
Tagged, Inc.

Reply via email to