We at Imageshack use Hbase to store all of our images, currently at ~2bl rows with about 350+ TB.
Jack On Friday, December 5, 2014, iain wright <iainw...@gmail.com> wrote: > Hi Jeremy, > > pinterest is using it for their feeds: > http://www.slideshare.net/cloudera/case-studies-session-3a > http://www.slideshare.net/cloudera/operations-session-1 > > Not sure on their dataset size, they are doing cluster level replication > for DR. We based our architecture on their success (cluster in each > az, multi master replication between them for DR, flume & api's watch > zookeeper znodes for which cluster to talk too-- talk to one cluster at a > time and we control flips between them for maintenance/DR). Our use case is > retrieving social data ingested from twitter/fb/etc. when customer facing > applications hit our social api. > > In terms of team size there are many variables > - If you are running your own metal there would be more work around > networking/rack+stack+cabling/provisioning os/etc. unless this is provided > by another dept already > - Do you have an hbase expert or DBA in house already? Or are your > developers going to take on learning schema design and tuning the cluster? > - Do you have sysadmins/devops available to write puppet/chef/ansible for > provisioning this cluster (and dev/qa enviornments) and performing > upgrades/etc. moving forward? > - Do you have a NOC & monitoring already in place for other pieces of infra > that will take on monitoring cluster health and responding to alerts/failed > disk/regionservers/etc. > > You may want to check out previous hbasecon and hadoop summit videos, lots > of presentations will talk about or at least mention their dataset size and > use case: > - https://www.youtube.com/user/HadoopSummit > - http://hbasecon.com/archive.html > > All the best, > > -- > Iain Wright > > This email message is confidential, intended only for the recipient(s) > named above and may contain information that is privileged, exempt from > disclosure under applicable law. If you are not the intended recipient, do > not disclose or disseminate the message to anyone except the intended > recipient. If you have received this message in error, or are not the named > recipient(s), please immediately notify the sender by return email, and > delete all copies of this message. > > On Fri, Dec 5, 2014 at 1:37 PM, jeremy p <athomewithagroove...@gmail.com > <javascript:;>> > wrote: > > > Hey all, > > > > So, I'm currently evaluating HBase as a solution for querying a very > large > > data set (think 60+ TB). We'd like to use it to directly power a > > customer-facing product. My question is threefold : > > > > 1) What companies use HBase to serve a customer-facing product? I'm not > > interested in evaluations, experiments, or POC. I'm also not interested > in > > offline BI or analytics. I'm specifically interested in cases where > HBase > > serves as the data store for a customer-facing product. > > > > 2) Of the companies that use HBase to serve a customer-facing product, > > which ones use it to query data sets of 60TB or more? > > > > 3) Of companies use HBase to query 60+ TB data sets and serve a > > customer-facing product, how many employees are required to support their > > HBase installation? In other words, if I were to start a team tomorrow, > > and their purpose was to maintain a 60+ TB HBase installation for a > > customer-facing product, how many people should I hire? > > > > 4) Of companies use HBase to query 60+ TB data sets and serve a > > customer-facing product, what kind of measures do they take for disaster > > recovery? > > > > If you can, please point me to articles, videos, and other materials. > > Obviously, the larger the company, the better case it will make for > HBase. > > > > Thank you! > > >