Thanks I was just reading the 1st link when i saw your mail.
My only concern is - won't S3 latency limitations interfere with my hbase
write/read throughputs? i do have delete operations in my app and writes to
rows that can be as large as 3MB, if not more.

thanks

On Wed, Sep 28, 2011 at 12:16 PM, Li Pi <[email protected]> wrote:

> This thread here might be useful:
> https://forums.aws.amazon.com/thread.jspa?threadID=34936
>
> Theres also a section on S3 here:
>
> http://ofps.oreilly.com/titles/9781449396107/installation.html
>
> On Wed, Sep 28, 2011 at 12:11 PM, Vinod Gupta Tankala
> <[email protected]> wrote:
> > thanks Li. I didn't know about using S3 as a datastore. Will look into
> this
> > more.
> >
> > I understand that hdfs replication will help in partial hardware failure.
> I
> > wanted to protect myself against inconsistencies as I have gotten bitten
> in
> > the past. That had happened due to hbase fatal exceptions. One of the
> > reasons for that could have been due to standalone mode as that is not
> > production ready, based on reading hbase documentation.
> > Another use case I have is - I would be writing sweeper jobs to delete
> user
> > data that is more than x months old. So in case, we need to retrieve old
> > user data, I would like to have the ability to get old data back from
> > exported tables. Ofcourse, I understand that to do so for selective user
> > accounts, I have to write custom jobs.
> >
> > thanks
> > vinod
> >
> > On Wed, Sep 28, 2011 at 11:49 AM, Li Pi <[email protected]> wrote:
> >
> >> What kind of situations are you looking for to guard against? Partial
> >> hardware failure, full hardware failure (of live cluster),
> >> accidentally deleting all data?
> >>
> >> HDFS provides replication that already guards against partial hardware
> >> failure - if this is all you need, a ephemeral store should be  fine.
> >>
> >> Also, HBase can use S3 directly as a datastore. You can choose the raw
> >> mode, in which HBase treats S3 as a disk. There used to be a block
> >> based mode as well, but now as S3 has increased the object size limit
> >> to 5tb, this isn't needed anymore. (Somebody correct me if i'm wrong).
> >>
> >> On Wed, Sep 28, 2011 at 9:15 AM, Vinod Gupta Tankala
> >> <[email protected]> wrote:
> >> > Hi,
> >> > Can someone answer these basic but important questions for me.
> >> > We are using hbase for our datastore and want to safeguard ourselves
> from
> >> > data corruption/data loss. Also we are hosted on aws ec2. Currently, I
> >> only
> >> > have a single node but want to prepare for scale right away as things
> are
> >> > going to change starting next couple of weeks. Also, I am currently
> using
> >> > ephemeral store for hbase data.
> >> >
> >> > 1) What is the recommended aws data store method for hbase? should you
> >> use
> >> > ephemeral store and do S3 backups or use EBS? I read and heard that
> EBS
> >> can
> >> > be expensive and also unreliable in terms of read/write latency.
> >> Ofcourse,
> >> > it provides data replication and protection, so you don't have to
> worry
> >> > about that.
> >> >
> >> > 2) What is the recommended backup/restore method for hbase? I would
> like
> >> to
> >> > take periodic data snapshots and then have a import utility that will
> >> > incrementally import data in case i lose some regions due to
> corruption
> >> or
> >> > table inconsistencies. also, if something catastrophic happens, i can
> >> > restore the whole data.
> >> >
> >> > 3) While we are at it, what is the recommended ec2 instance types for
> >> > running master/zookeeper/region servers? i get conflicting answers
> from
> >> > google search - ranging from c1.xlarge to m1.xlarge.
> >> >
> >> > I would really appreciate if someone could help me.
> >> >
> >> > thanks
> >> > vinod
> >> >
> >>
> >
>

Reply via email to