Yes, the persistence feature could also be configurable on a per-topic basis. If desired, persistence could then be limited only to certain critical topics, allowing less critical data to avoid the overhead. There could also be configuration options specifying when and how to persist the data. Some options might be to fsync() the data to disk immediately, at specified intervals, or just let the kernel decide when to write the dirty pages to disk from its page cache.
Dave -----Original Message----- From: "Pete Wright" <pwri...@rubiconproject.com> Sent: Monday, June 13, 2016 11:22am To: users@kafka.apache.org Subject: Re: Introducing Dory On Mon, Jun 13, 2016 at 11:45:13AM -0600, Jason J. W. Williams wrote: > Hi Dave, > > Dory sounds very exciting. Without persistence its less useful for clients > connected over a WAN, since if the WAN goes wonky you could build up quite > a queue until it comes back. > I was thinking the same thing. My first thought was to see what the LOE would be to implement some sort of spill-over process, where if the preallocated memory segment is exhausted it could spool data to disk. When connectivity to the brokers is back it could then de-spool data and produce it back to the cluster. I could see this being something worth persuing if you are handling data that is critical for financial reasons (as opposed to data used for non-financial reporting or metrics). -pete -- Pete Wright Lead Systems Architect Rubicon Project pwri...@rubiconproject.com 310.309.9298