Thanks I was just reading the 1st link when i saw your mail. My only concern is - won't S3 latency limitations interfere with my hbase write/read throughputs? i do have delete operations in my app and writes to rows that can be as large as 3MB, if not more.
thanks On Wed, Sep 28, 2011 at 12:16 PM, Li Pi <[email protected]> wrote: > This thread here might be useful: > https://forums.aws.amazon.com/thread.jspa?threadID=34936 > > Theres also a section on S3 here: > > http://ofps.oreilly.com/titles/9781449396107/installation.html > > On Wed, Sep 28, 2011 at 12:11 PM, Vinod Gupta Tankala > <[email protected]> wrote: > > thanks Li. I didn't know about using S3 as a datastore. Will look into > this > > more. > > > > I understand that hdfs replication will help in partial hardware failure. > I > > wanted to protect myself against inconsistencies as I have gotten bitten > in > > the past. That had happened due to hbase fatal exceptions. One of the > > reasons for that could have been due to standalone mode as that is not > > production ready, based on reading hbase documentation. > > Another use case I have is - I would be writing sweeper jobs to delete > user > > data that is more than x months old. So in case, we need to retrieve old > > user data, I would like to have the ability to get old data back from > > exported tables. Ofcourse, I understand that to do so for selective user > > accounts, I have to write custom jobs. > > > > thanks > > vinod > > > > On Wed, Sep 28, 2011 at 11:49 AM, Li Pi <[email protected]> wrote: > > > >> What kind of situations are you looking for to guard against? Partial > >> hardware failure, full hardware failure (of live cluster), > >> accidentally deleting all data? > >> > >> HDFS provides replication that already guards against partial hardware > >> failure - if this is all you need, a ephemeral store should be fine. > >> > >> Also, HBase can use S3 directly as a datastore. You can choose the raw > >> mode, in which HBase treats S3 as a disk. There used to be a block > >> based mode as well, but now as S3 has increased the object size limit > >> to 5tb, this isn't needed anymore. (Somebody correct me if i'm wrong). > >> > >> On Wed, Sep 28, 2011 at 9:15 AM, Vinod Gupta Tankala > >> <[email protected]> wrote: > >> > Hi, > >> > Can someone answer these basic but important questions for me. > >> > We are using hbase for our datastore and want to safeguard ourselves > from > >> > data corruption/data loss. Also we are hosted on aws ec2. Currently, I > >> only > >> > have a single node but want to prepare for scale right away as things > are > >> > going to change starting next couple of weeks. Also, I am currently > using > >> > ephemeral store for hbase data. > >> > > >> > 1) What is the recommended aws data store method for hbase? should you > >> use > >> > ephemeral store and do S3 backups or use EBS? I read and heard that > EBS > >> can > >> > be expensive and also unreliable in terms of read/write latency. > >> Ofcourse, > >> > it provides data replication and protection, so you don't have to > worry > >> > about that. > >> > > >> > 2) What is the recommended backup/restore method for hbase? I would > like > >> to > >> > take periodic data snapshots and then have a import utility that will > >> > incrementally import data in case i lose some regions due to > corruption > >> or > >> > table inconsistencies. also, if something catastrophic happens, i can > >> > restore the whole data. > >> > > >> > 3) While we are at it, what is the recommended ec2 instance types for > >> > running master/zookeeper/region servers? i get conflicting answers > from > >> > google search - ranging from c1.xlarge to m1.xlarge. > >> > > >> > I would really appreciate if someone could help me. > >> > > >> > thanks > >> > vinod > >> > > >> > > >
