We at Imageshack use Hbase to store all of our images, currently at ~2bl
rows with about 350+ TB.

Jack

On Friday, December 5, 2014, iain wright <iainw...@gmail.com> wrote:

> Hi Jeremy,
>
> pinterest is using it for their feeds:
> http://www.slideshare.net/cloudera/case-studies-session-3a
> http://www.slideshare.net/cloudera/operations-session-1
>
> Not sure on their dataset size, they are doing cluster level replication
> for DR. We based our architecture on their success (cluster in each
> az,  multi master replication between them for DR, flume & api's watch
> zookeeper znodes for which cluster to talk too-- talk to one cluster at a
> time and we control flips between them for maintenance/DR). Our use case is
> retrieving social data ingested from twitter/fb/etc. when customer facing
> applications hit our social api.
>
> In terms of team size there are many variables
> - If you are running your own metal there would be more work around
> networking/rack+stack+cabling/provisioning os/etc. unless this is provided
> by another dept already
> - Do you have an hbase expert or DBA in house already? Or are your
> developers going to take on learning schema design and tuning the cluster?
> - Do you have sysadmins/devops available to write puppet/chef/ansible for
> provisioning this cluster (and dev/qa enviornments) and performing
> upgrades/etc. moving forward?
> - Do you have a NOC & monitoring already in place for other pieces of infra
> that will take on monitoring cluster health and responding to alerts/failed
> disk/regionservers/etc.
>
> You may want to check out previous hbasecon and hadoop summit videos, lots
> of presentations will talk about or at least mention their dataset size and
> use case:
> - https://www.youtube.com/user/HadoopSummit
> - http://hbasecon.com/archive.html
>
> All the best,
>
> --
> Iain Wright
>
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
>
> On Fri, Dec 5, 2014 at 1:37 PM, jeremy p <athomewithagroove...@gmail.com
> <javascript:;>>
> wrote:
>
> > Hey all,
> >
> > So, I'm currently evaluating HBase as a solution for querying a very
> large
> > data set (think 60+ TB). We'd like to use it to directly power a
> > customer-facing product. My question is threefold :
> >
> > 1) What companies use HBase to serve a customer-facing product? I'm not
> > interested in evaluations, experiments, or POC.  I'm also not interested
> in
> > offline BI or analytics.  I'm specifically interested in cases where
> HBase
> > serves as the data store for a customer-facing product.
> >
> > 2) Of the companies that use HBase to serve a customer-facing product,
> > which ones use it to query data sets of 60TB or more?
> >
> > 3) Of companies use HBase to query 60+ TB data sets and serve a
> > customer-facing product, how many employees are required to support their
> > HBase installation?  In other words, if I were to start a team tomorrow,
> > and their purpose was to maintain a 60+ TB HBase installation for a
> > customer-facing product, how many people should I hire?
> >
> > 4) Of companies use HBase to query 60+ TB data sets and serve a
> > customer-facing product, what kind of measures do they take for disaster
> > recovery?
> >
> > If you can, please point me to articles, videos, and other materials.
> > Obviously, the larger the company, the better case it will make for
> HBase.
> >
> > Thank you!
> >
>

Reply via email to