Hah. Indeed it does. Thanks for the help. James
On Sep 23, 2014, at 10:54 AM, Dan Di Spaltro <[email protected]> wrote: > Simple question, did you copy and paste that snippet since it has two name > stanzas. > > On Tue, Sep 23, 2014 at 9:42 AM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> Hi James, >> >> Is it possible that you are impacted by >> https://issues.apache.org/jira/browse/HBASE-10118 ? Any change to test >> with >> one release where HBASE-10118 is available? >> >> JM >> >> 2014-09-23 12:10 GMT-04:00 James Estes <[email protected]>: >> >>> It does sound like what I'd want (that's why I was trying to use it :) ), >>> but it isn't working as described. Maybe it is a bug? >>> >>> The behavior I'm seeing is that the delete markers are removed on major >>> compaction, regardless of having a hbase.hstore.time.to.purge.deletes set >>> in hbase-site.xml: >>> https://gist.github.com/housejester/2b8fbba0d05c6abbe784 >>> >>> I think I've found the issue now. You mentioned the setting could be >>> applied per CF...so I tested that way, and it works as expected. My >>> hbase-site.xml had: >>> >>> <property> >>> <name>hbase.hstore.time.to.purge.deletes</name> >>> <name>600000</name> >>> </property> >>> >>> But that doesn't seem to be applied (even with restarts, etc). Setting >>> hbase.hstore.time.to.purge.deletes directly on the column family does >> work >>> though: >>> https://gist.github.com/housejester/a81274bf74a8666fba32 >>> >>> Not sure why it isn't picking up from my hbase-site.xml, but I'll just >>> configure it on the CFs. This is on hbase-0.98.6.1-hadoop2 and >>> hbase-0.96.0-hadoop2 running in local mode. >>> >>> Thanks Lars, >>> James >>> >>> On Mon, Sep 22, 2014 at 11:04 PM, lars hofhansl <[email protected]> >> wrote: >>> >>>> You can use the hbase.hstore.time.to.purge.deletes config option. >>>> You can set it globally or per Column Family. >>>> >>>> This is the description in hbase-default.xml: >>>> <property> >>>> <name>hbase.hstore.time.to.purge.deletes</name> >>>> <value>0</value> >>>> <description>The amount of time to delay purging of delete markers >>>> with future timestamps. If >>>> unset, or set to 0, all delete markers, including those with >> future >>>> timestamps, are purged >>>> during the next major compaction. Otherwise, a delete marker is >>> kept >>>> until the major compaction >>>> which occurs after the marker's timestamp plus the value of this >>>> setting, in milliseconds. >>>> </description> >>>> </property> >>>> >>>> That seems to be exactly what you want. >>>> >>>> -- Lars >>>> >>>> >>>> ----- Original Message ----- >>>> From: James Estes <[email protected]> >>>> To: [email protected] >>>> Cc: >>>> Sent: Monday, September 22, 2014 10:39 AM >>>> Subject: Configuring tombstone purge independent of deleted cell purge >>>> >>>> Could tombstone purges be independent of purging deleted cells and >>>> KEEP_DELETED_CELLS setting? In my use case, I do not want to keep >> deleted >>>> cells, but I do need to keep the tombstones around. Without the >>> tombstones, >>>> I'm not able to do incremental backups (custom, we do timerange raw >> scans >>>> ourselves for this). >>>> >>>> As a rough example, if I have the following timeline for the same row >> key >>>> (where t# is time): >>>> t0 - full backup (using a time range up to t0) >>>> t1 - PUT v1 >>>> t2 - incremental backup #1 (time range t0 up to t2) >>>> t3 - DELETE >>>> t4 - flush and major compaction happens >>>> t5 - incremental backup #2 (time range t2 up to t5) >>>> t6 - full system crash >>>> t7 - data restored from full backup + incrementals #1 and #2 >>>> >>>> When the restore completes, the row will have been un-deleted. This is >>>> because the incremental backup in #2 will not have the tombstone, since >>> it >>>> gets compacted out. >>>> >>>> So in our case, I do NOT want to keep deleted cells (because I do not >>> want >>>> the cells to show up in time range scans users may do), but I DO want >> to >>>> keep the tombstones for a configurable amount of time (much larger than >>> our >>>> planned incremental backup schedule) so they are captured during >> backup. >>>> This would allow for the custom incremental backups to be independent >> of >>>> major compactions. Without it, the backup schedule would have to >> manually >>>> handle compactions and would always have to do a FULL Backup after a >>> major >>>> compaction (otherwise there can be loss because when any major >> compaction >>>> happens, any tombstone that came in after the last incremental will be >>>> lost). >>>> >>>> It seems like there could be another setting for when to purge >>> tombstones. >>>> Currently, there is hbase.hstore.time.to.purge.deletes for when to >> purge >>>> deleted cells, but ONLY if KEEP_DELETED_CELLS is configured (which >> makes >>>> sense). I'd like to propose a hbase.hstore.time.to.purge.tombstones >> that >>>> could default to the same value as hbase.hstore.time.to.purge.deletes, >>> but >>>> would take effect regardless of the KEEP_DELETED_CELLS setting. It >> should >>>> have a constraint so that hbase.hstore.time.to.purge.deletes < >>>> hbase.hstore.time.to.purge.tombstones (b/c we don't want tombstones >>>> disappearing before the deleted cells). >>>> >>>> Does this seem reasonable? Is there another approach I might take? >>>> >>>> Thanks, >>>> >>>> >>> >> > > > > -- > Dan Di Spaltro
