Hi, I suppose you have read about HayStack as well? This gives an explanation on how Facebook stores its photos.
Needle in a haystack: efficient storage of billions of photos http://www.facebook.com/note.php?note_id=76191543919 Regards, Leen On Thu, Sep 23, 2010 at 8:57 AM, Jack Levin <[email protected]> wrote: > On Wed, Sep 22, 2010 at 10:39 AM, Sujee Maniyam <[email protected]> wrote: > > Jack, > > sounds like a cool project indeed. Few questions for you... > > > > > > 1) how do you setup 50+ servers. What I mean is, installing OS, > > installing all software. Setting up user accounts. Setting up SSH > > keys ..etc > > DO you use any software for this? > > Just ssh keys, using cfengine, and other things. > > > > > > 2) Is Hbase going to be the 'primary' storage for images? Meaning, > > your front-end reads/writes to Hbase? > > Do you also maintain a 'file storage' as a backup / alternative? > > There will be front end servers that will cache and serve files off > their disks, while hbase is going to keep the images highly available > for some products as well as keeping them safe. > > > > 3) Do you only store the 'main image' in Hbase? How about > > thumbnails, medium size, large size cousins? > > Those will be generated dynamically. > > > > > 4) Is this a dedicated Hbase cluster, or you are building this on top > > of your existing Hadoop cluster. Will this be sharing resources with > > MR jobs that you already run? > > Its dedicated. > > > > > 5) I notice you have 8G RAM for region servers. Hbase is very memory > > hungry and specially dealing with large data sizes, I'd imagine you'd > > need 24G-32G (as it was previously mentioned) > > This is not going to happen for two reasons, a) its too expensive and > motherboard does not support it b) We are aiming for large dataset > with 5 to 10% concentrated hits. > > > 6) how long does it take for all regions to be available after a 'cold > > start' of Hbase? > > 6-7 mins with dual core, 1 minute with 8 core. > > > 7) I'd be interested to know how do you do 'standby servers' for > > HMaster and Hadoop Namenode > > Just two 8 core boxes, running both namenode, secondary namenode and > masters. > > -Jack > > > > have fun > > > > regards > > Sujee > > > > http://sujee.net > > >
