You can now (0.92+) set the minium number of versions you want to always keep around together with TTL. See HBASE-4071
-- Lars ________________________________ From: "Buttler, David" <[email protected]> To: "[email protected]" <[email protected]> Sent: Wednesday, September 28, 2011 2:10 PM Subject: RE: Recommended backup/restore solution for hbase Wouldn't using a TTL on your data automatically delete data that is older than X months? Of course major compactions have to occur to get the data to automatically disappear. See: http://hbase.apache.org/book.html#ttl http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html#HColumnDescriptor(byte[], int, java.lang.String, boolean, boolean, int, int, java.lang.String, int) Dave -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Vinod Gupta Tankala Sent: Wednesday, September 28, 2011 12:12 PM To: [email protected] Subject: Re: Recommended backup/restore solution for hbase thanks Li. I didn't know about using S3 as a datastore. Will look into this more. I understand that hdfs replication will help in partial hardware failure. I wanted to protect myself against inconsistencies as I have gotten bitten in the past. That had happened due to hbase fatal exceptions. One of the reasons for that could have been due to standalone mode as that is not production ready, based on reading hbase documentation. Another use case I have is - I would be writing sweeper jobs to delete user data that is more than x months old. So in case, we need to retrieve old user data, I would like to have the ability to get old data back from exported tables. Ofcourse, I understand that to do so for selective user accounts, I have to write custom jobs. thanks vinod On Wed, Sep 28, 2011 at 11:49 AM, Li Pi <[email protected]> wrote: > What kind of situations are you looking for to guard against? Partial > hardware failure, full hardware failure (of live cluster), > accidentally deleting all data? > > HDFS provides replication that already guards against partial hardware > failure - if this is all you need, a ephemeral store should be fine. > > Also, HBase can use S3 directly as a datastore. You can choose the raw > mode, in which HBase treats S3 as a disk. There used to be a block > based mode as well, but now as S3 has increased the object size limit > to 5tb, this isn't needed anymore. (Somebody correct me if i'm wrong). > > On Wed, Sep 28, 2011 at 9:15 AM, Vinod Gupta Tankala > <[email protected]> wrote: > > Hi, > > Can someone answer these basic but important questions for me. > > We are using hbase for our datastore and want to safeguard ourselves from > > data corruption/data loss. Also we are hosted on aws ec2. Currently, I > only > > have a single node but want to prepare for scale right away as things are > > going to change starting next couple of weeks. Also, I am currently using > > ephemeral store for hbase data. > > > > 1) What is the recommended aws data store method for hbase? should you > use > > ephemeral store and do S3 backups or use EBS? I read and heard that EBS > can > > be expensive and also unreliable in terms of read/write latency. > Ofcourse, > > it provides data replication and protection, so you don't have to worry > > about that. > > > > 2) What is the recommended backup/restore method for hbase? I would like > to > > take periodic data snapshots and then have a import utility that will > > incrementally import data in case i lose some regions due to corruption > or > > table inconsistencies. also, if something catastrophic happens, i can > > restore the whole data. > > > > 3) While we are at it, what is the recommended ec2 instance types for > > running master/zookeeper/region servers? i get conflicting answers from > > google search - ranging from c1.xlarge to m1.xlarge. > > > > I would really appreciate if someone could help me. > > > > thanks > > vinod > > >
