Thank you guys for the pointers/info! I'll try to make use of it. If it
turns out into smth (like script, etc.) re-usable I will open a JIRA issue
and add it for others to use.

Thanx again,
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase

On Wed, Jul 11, 2012 at 8:51 AM, Stack <[email protected]> wrote:

> On Mon, Jul 9, 2012 at 10:05 PM, Alex Baranau <[email protected]>
> wrote:
> > I fear that complexity with removing HFiles can be caused by (block)
> cache
> > that may hold its information. Is that right? I'm actually OK with HBase
> to
> > return me the data of files I "deleted" by removing HFiles: I will
> specify
> > timerange on scans anyways (in this example to omit things older than 1
> > week).
> >
>
> I think this is a use case we should support natively.  Someone around
> the corner from us was looking to do this.  They load a complete
> dataset each night and on the weekends they want to just drop the old
> stuff by removing the hfiles > N days.
>
> You could script it now.  Look at the hfiles in hdfs -- they have
> sufficient metadata IIRC -- and then do the prescription Jon suggests
> above of close, remove, and reopen.  We could add an API to do this;
> i.e. reread hdfs for hfiles (would be nice to do it 'atomically'
> telling the new API which to drop).
>
> You bring up block cache.  That should be fine.  We shouldn't be
> reading blocks for files that are no longer open.  Old blocks should
> get aged out.
>
> On compaction dropping complete hfiles if they are outside TTL, I'm
> not sure we have that (didn't look too closely).
>
> St.Ack
>

Reply via email to