Hi Ted - if the data is keyed you can use a key compacted topic and essentially keep the data 'forever',i.e., you'll always have the latest version of the data for a given key. However, you'd still want to backup the data someplace else just-in-case.
On 16 February 2016 at 21:25, Ted Swerve <ted.swe...@gmail.com> wrote: > I guess I was just drawn in by the elegance of having everything available > in one well-defined Kafka topic should I start up some new code. > > If instead the Kafka topics were on a retention period of say 7 days, that > would involve firing up a topic to load the warehoused data from HDFS (or a > more traditional load), and then switching over to the live topic? > > On Tue, Feb 16, 2016 at 8:32 AM, Ben Stopford <b...@confluent.io> wrote: > > > Ted - it depends on your domain. More conservative approaches to long > > lived data protect against data corruption, which generally means > snapshots > > and cold storage. > > > > > > > On 15 Feb 2016, at 21:31, Ted Swerve <ted.swe...@gmail.com> wrote: > > > > > > HI Ben, Sharninder, > > > > > > Thanks for your responses, I appreciate it. > > > > > > Ben - thanks for the tips on settings. A backup could certainly be a > > > possibility, although if only with similar durability guarantees, I'm > not > > > sure what the purpose would be? > > > > > > Sharninder - yes, we would only be using the logs as forward-only > > streams - > > > i.e. picking an offset to read from and moving forwards - and would be > > > setting retention time to essentially infinite. > > > > > > Regards, > > > Ted. > > > > > > On Tue, Feb 16, 2016 at 5:05 AM, Sharninder Khera < > sharnin...@gmail.com> > > > wrote: > > > > > >> This topic comes up often on this list. Kafka can be used as a > datastore > > >> if that’s what your application wants with the caveat that Kafka isn’t > > >> designed to keep data around forever. There is a default retention > time > > >> after which older data gets deleted. The high level consumer > essentially > > >> reads data as a stream and while you can do sort of random access with > > the > > >> low level consumer, its not ideal. > > >> > > >> > > >> > > >>> On 15-Feb-2016, at 10:26 PM, Ted Swerve <ted.swe...@gmail.com> > wrote: > > >>> > > >>> Hello, > > >>> > > >>> Is it viable to use infinite-retention Kafka topics as a master data > > >>> store? I'm not talking massive volumes of data here, but still > > >> potentially > > >>> extending into tens of terabytes. > > >>> > > >>> Are there any drawbacks or pitfalls to such an approach? It seems > > like a > > >>> compelling design, but there seem to be mixed messages about its > > >>> suitability for this kind of role. > > >>> > > >>> Regards, > > >>> Ted > > >> > > >> > > > > >