Hello Kafka users, Meet Bruce, a producer daemon developed at Tagged, Inc. (http://www.tagged.com). We are open sourcing Bruce because we have found it useful at Tagged, and believe others may also benefit from it. Bruce is available on GitHub (https://github.com/tagged/bruce ).
We developed Bruce to function as a single intake point for a Kafka cluster that serves diverse clients written in a variety of programming languages. Clients write messages to Bruce's UNIX domain datagram socket using a simple binary format. Once a client writes a message, Bruce takes full responsibility for reliable delivery to the Kafka cluster. Communication between Bruce and clients is purely one-way. After writing a message to Bruce's socket, there is no need for a client to wait for an acknowledgement. The operating system provides the same reliability guarantee for UNIX domain sockets as for other local interprocess communication mechanisms such as traditional UNIX pipes. Example client code for writing messages to Bruce's socket is currently available in C, C++, Java, Python, and PHP. Community contributions for other programming languages are welcome. In addition to providing a simple uniform access point for clients, Bruce has a web-based status monitoring and data quality reporting interface. Bruce deals with transient load spikes and Kafka-related problems by buffering messages in memory up to a configurable limit, until they are sent and successfully acknowledged by a Kafka broker. If serious enough problems occur that Bruce is forced to discard messages, it tracks all discards and reports them through its web interface, giving a breakdown of discards by topic, including counts of discarded messages and windows of time in which they occurred. Per-topic information on messages queued to be sent or waiting for acknowledgements from Kafka is also available through Bruce's web interface. Bruce comes with Nagios-based health monitoring and discard reporting scripts, which are currently in use at Tagged to alert us if problems occur. The discard monitoring script stores Bruce's discard reports in an Oracle database so we have a complete, queryable history of data quality information. Bruce's web interface provides easy to parse JSON output to facilitate integration with other monitoring infrastructure. Bruce provides batching and compression that is configurable on a per- topic basis. Only Snappy compression is currently supported, but Bruce was designed to support multiple compression types. For more information, see Bruce's documentation which is available on its GitHub site. Cheers, Dave Peterson Tagged, Inc.