What's the read / write mix in your workload ?

Have you looked at HBASE-10070 'HBase read high-availability using
timeline-consistent region replicas' (phase 1 has been merged for the
upcoming 1.0 release) ?

Cheers


On Thu, Jul 31, 2014 at 8:17 AM, Wilm Schumacher <wilm.schumac...@cawoom.com
> wrote:

> Hi,
>
> I have a "conceptional" question and would appreciate hints.
>
> My task is to save files to hdfs and to maintain some informations about
> them in a hbase db and then serve both to the application.
>
> Per file I have around 50 rows with 10 columns (in 2 column families) in
> the tables, which have string values of length around 100.
>
> The files have normal size (perhaps between some kB to 100 MB or so).
>
> By this estimation the number of files are way smaller than the the
> number of rows (times columns), but the space on disk is way larger for
> the files than the space for the hbase. I would further estimate, that
> for every get on a file there should be around hundreds of getRows on
> the hbase.
>
> For the files I want to run an hadoop cluster (obviously). The question
> now arises: should I run the hbase on the same hadoop cluster?
>
> The pro of running together is obvious: i would only have to run one
> hadoop cluster which would which would save time, money and nerves.
>
> On the other hand it wouldn't be possible to make special adjustments
> for optimizing the cluster for one or the other task. E.g. if I want to
> make the hbase more "distributed" by optimizing the replication (to
> let's say 6) I would have to use a doubled amount of disk for the
> "normal" files, too.
>
> So: what should I do?
>
> Do you have any comments or hints on this question
>
> Best wishes,
>
> wilm
>

Reply via email to