Hi Fredric

hbase.store.delete.expired.storefile - Set this property to true.

This property helps you to delete the store files before compaction.  If you
are interested you can check HBASE-5199.

It is available in 0.94 and above.  Hope this helps.


Regards
Ram

> -----Original Message-----
> From: Frédéric Fondement [mailto:frederic.fondem...@uha.fr]
> Sent: Monday, June 25, 2012 2:05 PM
> To: user@hbase.apache.org
> Subject: Re: TTL performance
> 
> Hi,
> 
> And thanks for your answers.
> 
> Actually, I'm already having control on my major compactions using a
> cron, at night, merely execution this bash code:
> echo "status 'detailed'" | hbase shell | grep "<<table prefix>>" | awk
> -F, '{print $1}' | tr -d ' ' | sort | uniq -c | sort -nr | awk '{print
> "major_compact " sprintf( "%c", 39 ) $2 sprintf( "%c", 39 )}' | hbase
> shell >>$LOGFILE 2>&1
> This lines makes it sure biggest regions are major-compacted first.
> 
> I'm not using versions.
> 
> My question was actually: given a table with millions, billions or
> whatever number of rows, how fast is the TTL handling process ? How are
> rows scanned during major compaction ? Are they all scanned in order to
> know whether they should be removed from the filesystem (be it HDFS or
> whatever else) ? Or is there any optimization making sure it can fatly
> finds those parts to be deleted ?
> 
> Best regards,
> 
> Frédéric.
> 
> 
> Le 21/06/2012 23:03, Andrew Purtell a écrit :
> >> 2012/6/21, Frédéric Fondement<frederic.fondem...@uha.fr>:
> >> opt3. looks the nicest (only 3-4 tables to scan when reading), but
> won't my daily major compact become crazy ?
> > If you want more control over the major compaction process, for
> > example to lessen the load on your production cluster to a constant
> > background level, the HBase shell is the JRuby irb so you have the
> > full power of the HBase API and Ruby, in the worst case you can write
> > a shell script that gets a list of regions and triggers major
> > compaction on each region separately or according to whatever policy
> > you construct. The script invocation can happen manually or out of
> > crontab.
> >
> > Another performance consideration is how many expired cells might
> have
> > to be skipped by a scan. If you have a wide area of the keyspace that
> > is all expired at once, then the scan will seem to "pause" while
> > traversing this area. However, you can use setTimeRange to bound your
> > scan by time range and then HBase can optimize whole HFiles away just
> > by examining their metadata. Therefore I would recommend using both
> > TTLs for automatic background garbage collection of expired entries,
> > as well as time range bounded scans for read time optimization.
> >
> > Incidentally, there was an interesting presentation at HBaseCon
> > recently regarding a creative use of timestamps:
> > http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h-
> base-suraj-varma-gap-inc-finalupdated-last-minute
> > (slide 16).
> >
> > Best regards,
> >
> >     - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein (via Tom White)

Reply via email to