Pretty much the implementation that we're using, exception using partitions within a single topic because we have a low number of failures passing through. Granted the first partition will take the brunt of the load vs the higher order partitions.
On Tue, Oct 10, 2017 at 7:30 AM, Steven Schlansker < sschlans...@opentable.com> wrote: > > > On Oct 9, 2017, at 2:41 PM, John Walker <johnwalk...@gmail.com> wrote: > > > > I have a pair of services. One dispatches commands to the other for > > processing. > > > > My consumer sometimes fails to execute commands as a result of transient > > errors. To deal with this, commands are retried after an exponentially > > increasing delay up to a maximum of 4 times. (Delays: 1hr, 2hr, 4hr, > 8hr.) > > What's the standard way to set up something like this using Kafka? > > > > The only solution I've found so far is to setup 5 topics (main_topic, > > delayed_1hr, delayed_2hr, delayed_4hr,delayed_8hr), and then have pollers > > that poll each of these topics, enforce delays, and escalate messages > from > > one topic to another if errors occur. > > Our approach for a similar scheduling problem was to assign each command a > unique key and a "valid after" date. Any time you fail to execute a > command, > you update the "valid after" with your exponential backoff algorithm, and > produce > the updated value over the same key. Old versions are removed by > compaction. > > Biggest downside is now your queue is no longer ordered, but for our > problem > in practice the number of pending messages is relatively small so we simply > scan all outstanding messages every delivery interval to see if any is > eligible > for a retry. > >