Hi, Can someone answer these basic but important questions for me. We are using hbase for our datastore and want to safeguard ourselves from data corruption/data loss. Also we are hosted on aws ec2. Currently, I only have a single node but want to prepare for scale right away as things are going to change starting next couple of weeks. Also, I am currently using ephemeral store for hbase data.
1) What is the recommended aws data store method for hbase? should you use ephemeral store and do S3 backups or use EBS? I read and heard that EBS can be expensive and also unreliable in terms of read/write latency. Ofcourse, it provides data replication and protection, so you don't have to worry about that. 2) What is the recommended backup/restore method for hbase? I would like to take periodic data snapshots and then have a import utility that will incrementally import data in case i lose some regions due to corruption or table inconsistencies. also, if something catastrophic happens, i can restore the whole data. 3) While we are at it, what is the recommended ec2 instance types for running master/zookeeper/region servers? i get conflicting answers from google search - ranging from c1.xlarge to m1.xlarge. I would really appreciate if someone could help me. thanks vinod
