On Wed, Jan 19, 2011 at 12:59 AM, Zhu Han <schumi....@gmail.com> wrote: > > > On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf <german.kond...@gmail.com> > wrote: >> >> Yes, that's what I meant, but correct me if I'm wrong, when a deletion >> comes after another deletion for the same row or column will the gc-before >> count against the last one, isn't it? >> > IIRC, after compaction. even if the row key is not wiped, all the CF are > replaced by the youngest tombstone. I do not understand very clearly the > benefit of wiping out the whole row as early as possible. >
I think it is not a "benefit", but a potencial issue, if you delete columns or rows without checking them before you could make them live as long as you keep issuing deletions, maybe it's a strange use-case, but certainly Cassandra provides new non-traditional ways of processing high-volume of information. As the original example depicted clearly: day 1 -> insert Row1.Col1 day 2 -> delete Row1.Col1 day 11 (before gc-grace-seconds) -> delete Row1.Col1 In the last command I've extended the life of a tombstone, maybe the check before the deletion could have a performance impact in the process, so I think it might be handled server-side instead of client-side. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ >> >> Maybe knowing that all the subsequent versions of a deletion are deletions >> too, it could take the first timestamp against the gc-grace-seconds when is >> reducing & compacting. >> >> // Germán Kondolf >> http://twitter.com/germanklf >> http://code.google.com/p/seide/ >> // @i4 >> >> On 19/01/2011, at 00:16, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> > If you mean that multiple tombstones for the same row or column should >> > be merged into a single one at compaction time, then yes, that is what >> > happens. >> > >> > On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf >> > <german.kond...@gmail.com> wrote: >> >> Maybe it could be taken into account when the compaction is executed, >> >> if I only have a consecutive list of uninterrupted tombstones it could >> >> only care about the first. It sounds like the-way-it-should-be, maybe >> >> as a part of the "row-reduce" process. >> >> >> >> Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. >> >> >> >> //GK >> >> http://twitter.com/germanklf >> >> http://code.google.com/p/seide/ >> >> >> >> On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne >> >> <sylv...@riptano.com> wrote: >> >>> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn <da...@lookin2.com> >> >>> wrote: >> >>>> Thanks, Aaron, but I'm not 100% clear. >> >>>> >> >>>> My situation is this: My use case spins off rows (not columns) that I >> >>>> no >> >>>> longer need and want to delete. It is possible that these rows were >> >>>> never >> >>>> created in the first place, or were already deleted. This is a very >> >>>> large >> >>>> cleanup task that normally deletes a lot of rows, and the last thing >> >>>> that I >> >>>> want to do is create tombstones for rows that didn't exist in the >> >>>> first >> >>>> place, or lengthen the life on disk of tombstones of rows that are >> >>>> already >> >>>> deleted. >> >>>> >> >>>> So the question is: before I delete, do I have to retrieve the row to >> >>>> see if >> >>>> it exists in the first place? >> >>> >> >>> Yes, in your situation you do. >> >>> >> >>>> >> >>>> >> >>>> >> >>>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton >> >>>> <aa...@thelastpickle.com> >> >>>> wrote: >> >>>>> >> >>>>> AFAIK that's not necessary, there is no need to worry about previous >> >>>>> deletes. You can delete stuff that does not even exist, neither >> >>>>> batch_mutate >> >>>>> or remove are going to throw an error. >> >>>>> All the columns that were (roughly speaking) present at your first >> >>>>> deletion will be available for GC at the end of the first tombstones >> >>>>> life. >> >>>>> Same for the second. >> >>>>> Say you were to write a col between the two deletes with the same >> >>>>> name as >> >>>>> one present at the start. The first version of the col is avail for >> >>>>> GC after >> >>>>> tombstone 1, and the second after tombstone 2. >> >>>>> Hope that helps >> >>>>> Aaron >> >>>>> On 18/01/2011, at 9:37 PM, David Boxenhorn <da...@lookin2.com> >> >>>>> wrote: >> >>>>> >> >>>>> Thanks. In other words, before I delete something, I should check to >> >>>>> see >> >>>>> whether it exists as a live row in the first place. >> >>>>> >> >>>>> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King <r...@twitter.com> wrote: >> >>>>>> >> >>>>>> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn >> >>>>>> <da...@lookin2.com> >> >>>>>> wrote: >> >>>>>>> If I delete a row, and later on delete it again, before >> >>>>>>> GCGraceSeconds >> >>>>>>> has >> >>>>>>> elapsed, does the tombstone live longer? >> >>>>>> >> >>>>>> Each delete is a new tombstone, which should answer your question. >> >>>>>> >> >>>>>> -ryan >> >>>>>> >> >>>>>>> In other words, if I have the following scenario: >> >>>>>>> >> >>>>>>> GCGraceSeconds = 10 days >> >>>>>>> On day 1 I delete a row >> >>>>>>> On day 5 I delete the row again >> >>>>>>> >> >>>>>>> Will the tombstone be removed on day 10 or day 15? >> >>>>>>> >> >>>>> >> >>>> >> >>>> >> >>> >> >> >> > >> > >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of Riptano, the source for professional Cassandra support >> > http://riptano.com >> > > //GK http://twitter.com/germanklf http://code.google.com/p/seide/