Hi Fredric hbase.store.delete.expired.storefile - Set this property to true.
This property helps you to delete the store files before compaction. If you are interested you can check HBASE-5199. It is available in 0.94 and above. Hope this helps. Regards Ram > -----Original Message----- > From: Frédéric Fondement [mailto:frederic.fondem...@uha.fr] > Sent: Monday, June 25, 2012 2:05 PM > To: user@hbase.apache.org > Subject: Re: TTL performance > > Hi, > > And thanks for your answers. > > Actually, I'm already having control on my major compactions using a > cron, at night, merely execution this bash code: > echo "status 'detailed'" | hbase shell | grep "<<table prefix>>" | awk > -F, '{print $1}' | tr -d ' ' | sort | uniq -c | sort -nr | awk '{print > "major_compact " sprintf( "%c", 39 ) $2 sprintf( "%c", 39 )}' | hbase > shell >>$LOGFILE 2>&1 > This lines makes it sure biggest regions are major-compacted first. > > I'm not using versions. > > My question was actually: given a table with millions, billions or > whatever number of rows, how fast is the TTL handling process ? How are > rows scanned during major compaction ? Are they all scanned in order to > know whether they should be removed from the filesystem (be it HDFS or > whatever else) ? Or is there any optimization making sure it can fatly > finds those parts to be deleted ? > > Best regards, > > Frédéric. > > > Le 21/06/2012 23:03, Andrew Purtell a écrit : > >> 2012/6/21, Frédéric Fondement<frederic.fondem...@uha.fr>: > >> opt3. looks the nicest (only 3-4 tables to scan when reading), but > won't my daily major compact become crazy ? > > If you want more control over the major compaction process, for > > example to lessen the load on your production cluster to a constant > > background level, the HBase shell is the JRuby irb so you have the > > full power of the HBase API and Ruby, in the worst case you can write > > a shell script that gets a list of regions and triggers major > > compaction on each region separately or according to whatever policy > > you construct. The script invocation can happen manually or out of > > crontab. > > > > Another performance consideration is how many expired cells might > have > > to be skipped by a scan. If you have a wide area of the keyspace that > > is all expired at once, then the scan will seem to "pause" while > > traversing this area. However, you can use setTimeRange to bound your > > scan by time range and then HBase can optimize whole HFiles away just > > by examining their metadata. Therefore I would recommend using both > > TTLs for automatic background garbage collection of expired entries, > > as well as time range bounded scans for read time optimization. > > > > Incidentally, there was an interesting presentation at HBaseCon > > recently regarding a creative use of timestamps: > > http://www.slideshare.net/cloudera/1-serving-apparel-catalog-from-h- > base-suraj-varma-gap-inc-finalupdated-last-minute > > (slide 16). > > > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet > > Hein (via Tom White)