Hi,

I suppose you have read about HayStack as well? This gives an explanation on
how Facebook stores its photos.

Needle in a haystack: efficient storage of billions of photos
http://www.facebook.com/note.php?note_id=76191543919

Regards,
Leen

On Thu, Sep 23, 2010 at 8:57 AM, Jack Levin <[email protected]> wrote:

> On Wed, Sep 22, 2010 at 10:39 AM, Sujee Maniyam <[email protected]> wrote:
> > Jack,
> > sounds like a cool project indeed.  Few questions for you...
> >
> >
> > 1) how do you setup 50+ servers.  What I mean is, installing OS,
> > installing all software.  Setting up user accounts.  Setting up SSH
> > keys ..etc
> >     DO you use any software for this?
>
> Just ssh keys, using cfengine, and other things.
>
>
> >
> > 2) Is Hbase going to be the 'primary' storage for images?  Meaning,
> > your front-end reads/writes to Hbase?
> > Do you also maintain a 'file storage' as a backup / alternative?
>
> There will be front end servers that will cache and serve files off
> their disks, while hbase is going to keep the images highly available
> for some products as well as keeping them safe.
>
>
> > 3) Do you only store the 'main image' in Hbase?  How about
> > thumbnails, medium size, large size cousins?
>
> Those will be generated dynamically.
>
> >
> > 4) Is this a dedicated Hbase cluster, or you are building this on top
> > of your existing Hadoop cluster.  Will this be sharing  resources with
> > MR jobs that you already run?
>
> Its dedicated.
>
> >
> > 5) I notice you have 8G RAM for region servers.  Hbase is very memory
> > hungry and specially dealing with large data sizes, I'd imagine you'd
> > need 24G-32G (as it was previously mentioned)
>
> This is not going to happen for two reasons, a) its too expensive and
> motherboard does not support it b) We are aiming for large dataset
> with 5 to 10% concentrated hits.
>
> > 6) how long does it take for all regions to be available after a 'cold
> > start' of Hbase?
>
> 6-7 mins with dual core, 1 minute with 8 core.
>
> > 7) I'd be interested to know how do you do 'standby servers' for
> > HMaster and Hadoop Namenode
>
> Just two 8 core boxes, running both namenode, secondary namenode and
> masters.
>
> -Jack
>
>
> > have fun
> >
> > regards
> > Sujee
> >
> > http://sujee.net
> >
>

Reply via email to