I am not with StumbleUpon, but I can tell you how one of my clients does it.  

We are serving the website from HBase.  We used to try M/R jobs on the same 
cluster, but quickly found that this was a bad idea.  We do very minimal Hadoop 
M/R apps on the web cluster.  Every time we run an M/R job on the web cluster, 
we see increased latency on the web app.  It makes sense, if you do more work 
on the cluster, it will not be able to respond as quickly. This isn't a new 
idea, traditionally data warehousing/deep analytics tasks are separated from 
OLTP processing.  

We ended up splitting the cluster into two clusters: a web cluster and a 
compute cluster.  The longer jobs that run on the compute cluster have quite a 
few steps.  The first step pulls data from the HBase cluster, and the final 
step puts the results back to the HBase cluster.   

We manage our own indexes.   The only jobs that run on the HBase cluster are 
indexing jobs.  Anything that does any sort of analytics runs on the compute 
cluster. 

-Matthew


On Sep 23, 2010, at 10:11 PM, Bishal Acharya wrote:

> I am running a 20 node cluster with hadoop/hbase. Currently what I am doing 
> is that, I run the MR jobs in the cluster and at the same time I am serving 
> my web application directly from Hbase in the same cluster. What happens is, 
> when I am not running any MR jobs the applications are running perfectly 
> fine, But when I run MR jobs at the same time as I am browsing my 
> application, I am faced with this increase in latency while browsing. How 
> could I properly manage my cluster so that I don't have to face the added 
> latency due to cluster being saturated by MR jobs. I wanted to know 
> specifically how this is done in companies using Hbase for front serving for 
> example at StumbleUpon ? How do they manage this issue ?
> 
> 
> -- 
> 
> Sincerely,
> 
> 
> *Bishal Acharya*
> 
> /Software Engineer | D2HawkeyeServices Pvt. Ltd. | Subsidiary of Verisk 
> Health,USA/
> **
> 
> Cell +977-9849378541 | [email protected] | www.d2hawkeyeservices.com
> 
> P  Request : Unless absolutely necessary, please do not print this e-mail. 
> Help save environment. Thank you.
> 
> 
> 
> 
> This email is intended for the recipient only. If you are not the intended
> recipient please disregard, and do not use the information for any purpose.

Reply via email to