Hi, In a distributed system, such as Cassandra, things can happen (node down, stop the world GC, hardware issue, ...) and desynchronize replicas, isn't repairing also a needed operation to keep replicas up to date at least once a week or once a month ? It is a strong and reliable process to keep things synced, isn't it ?
I know that read repairs and hinted handoff are also there to handle this kind of issues, but they might fail (I saw a lot of error in the logs around hints not being delivered - some people even disable them - and read repairs are often configured to trigger on 10% of the reads). 2014-01-28 14:53 GMT+01:00 Sylvain Lebresne <sylv...@datastax.com>: > >> I have actually set up one of our application streams such that the same >> key is only overwritten with a monotonically increasing ttl. >> >> For example, a breaking news item might have an initial ttl of 60 >> seconds, followed in 45 seconds by an update with a ttl of 3000 seconds, >> followed by an 'ignore me' update in 600 seconds with a ttl of 30 days (our >> maximum ttl) when the article is published. >> >> My understanding is that this case fits the criteria and no 'periodic >> repair' is needed. >> > > That's correct. The real criteria for not needing repair if you do no > deletes but only TTL is "update only with monotonically increasing (non > necessarily strictly) ttl". Always setting the same TTL is just a special > case of that, but it's the most commonly used one I think, so I tend to > simplify it to that case. > > >> >> I guess another thing I would point out that is easy to miss or forget >> (if you are a newish user like me), is that ttl's are fine-grained, by >> column. So we are talking 'fixed' or 'variable' by individual column, not >> by table. Which means, in my case, that ttl's can vary widely across a >> table, but as long as I constrain them by key value to be fixed or >> monotonically increasing, it fits the criteria. >> > > We're talking monotonically increasing ttl "for a given primary key' if > we're talking the CQL language and "for a given column" if we're talking > the thrift one. Not "by table". > > -- > Sylvain > > > >> >> Cheers, >> >> Michael >> >> >> On Tue, Jan 28, 2014 at 4:18 AM, Sylvain Lebresne >> <sylv...@datastax.com>wrote: >> >>> On Tue, Jan 28, 2014 at 1:05 AM, Edward Capriolo >>> <edlinuxg...@gmail.com>wrote: >>> >>>> If you have only ttl columns, and you never update the column I would >>>> not think you need a repair. >>>> >>> >>> Right, no deletes and no updates is the case 1. of Michael on which I >>> think we all agree 'periodic repair to avoid resurrected columns' is not >>> required. >>> >>> >>>> >>>> Repair cures lost deletes. If all your writes have a ttl a lost write >>>> should not matter since the column was never written to the node and thus >>>> could never be resurected on said node. >>>> >>> >>> I'm sure we're all in agreement here, but for the record, this is only >>> true if you have no updates (overwrites) and/or if all writes have the >>> *same* ttl. Because in the general case, a column with a relatively short >>> TTL is basically very close to a delete, while a column with a long TTL is >>> very close from one that has no TTL. If the former column (with short TTL) >>> overwrites the latter one (with long TTL), and if one nodes misses the >>> overwrite, that node could resurrect the column with the longer TTL (until >>> that column expires that is). Hence the separation of the case 2. (fixed >>> ttl, no repair needed) and 2.a. (variable ttl, repair may be needed). >>> >>> -- >>> Sylvain >>> >>> >>>> >>>> Unless i am missing something. >>>> >>>> On Monday, January 27, 2014, Laing, Michael <michael.la...@nytimes.com> >>>> wrote: >>>> > Thanks Sylvain, >>>> > Your assumption is correct! >>>> > So I think I actually have 4 classes: >>>> > 1. Regular values, no deletes, no overwrites, write heavy, >>>> variable ttl's to manage size >>>> > 2. Regular values, no deletes, some overwrites, read heavy (10 to >>>> 1), fixed ttl's to manage size >>>> > 2.a. Regular values, no deletes, some overwrites, read heavy (10 to >>>> 1), variable ttl's to manage size >>>> > 3. Counter values, no deletes, update heavy, rotation/truncation >>>> to manage size >>>> > Only 2.a. above requires me to do 'periodic repair'. >>>> > What I will actually do is change my schema and applications slightly >>>> to eliminate the need for overwrites on the only table I have in that >>>> category. >>>> > And I will set gc_grace_period to 0 for the tables in the updated >>>> schema and drop 'periodic repair' from the schedule. >>>> > Cheers, >>>> > Michael >>>> > >>>> > >>>> > On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne < >>>> sylv...@datastax.com> wrote: >>>> >> >>>> >> By periodic repair, I'll assume you mean "having to run repair every >>>> gc_grace period to make sure no deleted entries resurrect". With that >>>> assumption: >>>> >> >>>> >>> >>>> >>> 1. Regular values, no deletes, no overwrites, write heavy, ttl's to >>>> manage size >>>> >> >>>> >> Since 'repair within gc_grace' is about avoiding value that have >>>> been deleted to resurrect, if you do no delete nor overwrites, you're in no >>>> risk of that (and don't need to 'repair withing gc_grace'). >>>> >> >>>> >>> >>>> >>> 2. Regular values, no deletes, some overwrites, read heavy (10 to >>>> 1), ttl's to manage size >>>> >> >>>> >> It depends a bit. In general, if you always set the exact same TTL >>>> on every insert (implying you always set a TTL), then you have nothing to >>>> worry about. If the TTL varies (of if you only set TTL some of the times), >>>> then you might still need to have some periodic repairs. That being said, >>>> if there is no deletes but only TTLs, then the TTL kind of lengthen the >>>> period at which you need to do repair: instead of needing to repair withing >>>> gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL) >>>> is the smallest TTL you set on columns). >>>> >>> >>>> >>> 3. Counter values, no deletes, update heavy, rotation/truncation to >>>> manage size >>>> >> >>>> >> No deletes and no TTL implies that your fine (as in, there is no >>>> need for 'repair withing gc_grace'). >>>> >> >>>> >> -- >>>> >> Sylvain >>>> > >>>> >>>> -- >>>> Sorry this was sent from mobile. Will do less grammar and spell check >>>> than usual. >>>> >>> >>> >> >