On Mon, Jul 9, 2012 at 1:05 PM, Alex Baranau <[email protected]>wrote:
> Hey, this is closer! > > However, I think I'd want to avoid major compaction. In fact I was thinking > about avoiding any compactions & splitting. > ... So, you are saying that major compaction will look at max/min ts metainfo > of the HFile and will remove the whole file based on ttl if necessary > (without going through the file)? Can I tell it not to actually compact > other HFiles (i.e. leave them as is, otherwise it would be not as easy to > remove HFiles again in an hour)? I.e. looks like "delete only whole HFiles > based on TTL" functionality is wat I need here.. > > Of the top of my head, I don't know how "smart" the major compaction code is wrt to ttls. I'm pretty sure it isn't smart enough to explicitly ignore specific files. > I fear that complexity with removing HFiles can be caused by (block) cache > that may hold its information. Is that right? I'm actually OK with HBase to > return me the data of files I "deleted" by removing HFiles: I will specify > timerange on scans anyways (in this example to omit things older than 1 > week). > > I'm not sure what the block cache eviction policy is when a single region is closed, but it sounds like you are ok if stale data remains. Sounds like you might want to try the close/delete/open advanced approach on a test cluster to see if it meets your needs. Jon. -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
