Hey Jack: Thanks for writing.
See below for some comments. On Mon, Sep 20, 2010 at 11:00 AM, Jack Levin <[email protected]> wrote: > > Image-Shack gets close to two million image uploads per day, which are > usually stored on regular servers (we have about 700), as regular > files, and each server has its own host name, such as (img55). I've > been researching on how to improve our backend design in terms of data > safety and stumped onto the Hbase project. > Any other requirements other than data safety? (latency, etc). > Now, I think hbase is he most beautiful thing that happen to > distributed DB world :). The idea is to store image files (about > 400Kb on average into HBASE). I'd guess some images are much bigger than this. Do you ever limit the size of images folks can upload to your service? The setup will include the following > configuration: > > 50 servers total (2 datacenters), with 8 GB RAM, dual core cpu, 6 x > 2TB disks each. > 3 to 5 Zookeepers > 2 Masters (in a datacenter each) > 10 to 20 Stargate REST instances (one per server, hash loadbalanced) Whats your frontend? Why REST? It might be more efficient if you could run with thrift given REST base64s its payload IIRC (check the src yourself). > 40 to 50 RegionServers (will probably keep masters separate on dedicated > boxes). > 2 Namenode servers (one backup, highly available, will do fsimage and > edits snapshots also) > > So far I got about 13 servers running, and doing about 20 insertions / > second (file size ranging from few KB to 2-3MB, ave. 400KB). via > Stargate API. Our frontend servers receive files, and I just > fork-insert them into stargate via http (curl). > The inserts are humming along nicely, without any noticeable load on > regionservers, so far inserted about 2 TB worth of images. > I have adjusted the region file size to be 512MB, and table block size > to about 400KB , trying to match average access block to limit HDFS > trips. As Todd suggests, I'd go up from 512MB... 1G at least. You'll probably want to up your flush size from 64MB to 128MB or maybe 192MB. So far the read performance was more than adequate, and of > course write performance is nowhere near capacity. > So right now, all newly uploaded images go to HBASE. But we do plan > to insert about 170 Million images (about 100 days worth), which is > only about 64 TB, or 10% of planned cluster size of 600TB. > The end goal is to have a storage system that creates data safety, > e.g. system may go down but data can not be lost. Our Front-End > servers will continue to serve images from their own file system (we > are serving about 16 Gbits at peak), however should we need to bring > any of those down for maintenance, we will redirect all traffic to > Hbase (should be no more than few hundred Mbps), while the front end > server is repaired (for example having its disk replaced), after the > repairs, we quickly repopulate it with missing files, while serving > the missing remaining off Hbase. > All in all should be very interesting project, and I am hoping not to > run into any snags, however, should that happens, I am pleased to know > that such a great and vibrant tech group exists that supports and uses > HBASE :). > We're definetly interested in how your project progresses. If you are ever up in the city, you should drop by for a chat. St.Ack P.S. I'm also w/ Todd that you should move to 0.89 and blooms. P.P.S I updated the wiki on stargate REST: http://wiki.apache.org/hadoop/Hbase/Stargate
