Thank you guys for the pointers/info! I'll try to make use of it. If it turns out into smth (like script, etc.) re-usable I will open a JIRA issue and add it for others to use.
Thanx again, Alex Baranau ------ Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Wed, Jul 11, 2012 at 8:51 AM, Stack <[email protected]> wrote: > On Mon, Jul 9, 2012 at 10:05 PM, Alex Baranau <[email protected]> > wrote: > > I fear that complexity with removing HFiles can be caused by (block) > cache > > that may hold its information. Is that right? I'm actually OK with HBase > to > > return me the data of files I "deleted" by removing HFiles: I will > specify > > timerange on scans anyways (in this example to omit things older than 1 > > week). > > > > I think this is a use case we should support natively. Someone around > the corner from us was looking to do this. They load a complete > dataset each night and on the weekends they want to just drop the old > stuff by removing the hfiles > N days. > > You could script it now. Look at the hfiles in hdfs -- they have > sufficient metadata IIRC -- and then do the prescription Jon suggests > above of close, remove, and reopen. We could add an API to do this; > i.e. reread hdfs for hfiles (would be nice to do it 'atomically' > telling the new API which to drop). > > You bring up block cache. That should be fine. We shouldn't be > reading blocks for files that are no longer open. Old blocks should > get aged out. > > On compaction dropping complete hfiles if they are outside TTL, I'm > not sure we have that (didn't look too closely). > > St.Ack >
