M7 is not Apache HBase, or any HBase. It is a proprietary NoSQL datastore
with (I gather) an Apache HBase compatible Java API.

As for running HBase on EC2, we recently discussed some particulars, see
the latter part of this thread: http://search-hadoop.com/m/rI1HpK90gu where
I hijack it. I wouldn't recommend launching HBase as part of an EMR flow
unless you want to use it only for temporary random access storage, and in
which case use m2.2xlarge/m2.4xlarge instance types. Otherwise, set up a
dedicated HBase backed storage service on high I/O instance types. The
fundamental issue is IO performance on the EC2 platform is fair to poor.

I have also noticed a large difference in baseline block device latency if
using an old Amazon Linux AMI (< 2013) or the latest AMIs from this year.
Use the new ones, they cut the latency long tail in half. There were some
significant kernel level improvements I gather.


On Wed, May 8, 2013 at 10:42 AM, Marcos Luis Ortiz Valmaseda <
marcosluis2...@gmail.com> wrote:

> I think that you when you are talking about RMap, you are referring to
> MapR´s distribution.
> I think that MapR´s team released a very good version of its Hadoop
> distribution focused on HBase called M7. You can see its overview here:
> http://www.mapr.com/products/mapr-editions/m7-edition
>
> But this release was under beta testing, and I see that it´s not included
> in the Amazon Marketplace yet:
>
> https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5
>
>
>
>
> 2013/5/7 Pal Konyves <paul.kony...@gmail.com>
>
> > Hi,
> >
> > Has anyone got some recommendations about running HBase on EC2? I am
> > testing it, and so far I am very disappointed with it. I did not change
> > anything about the default 'Amazon distribution' installation. It has one
> > MasterNode and two slave nodes, and write performance is around 2500
> small
> > rows per sec at most, but I expected it to be way  better. Oh, and this
> is
> > with batch put operations with autocommit turned off, where each batch
> > containes about 500-1000 rows... When I do it with autocommit, it does
> not
> > even reach the 1000 rows per sec.
> >
> > Every nodes were m1.Large ones.
> >
> > Any experiences, suggestions? Is it worth to try the RMap distribution
> > instead of the amazon one?
> >
> > Thanks,
> > Pal
> >
>
>
>
> --
> Marcos Ortiz Valmaseda
> Product Manager at PDVSA
> http://about.me/marcosortiz
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to