> Am getting back to using Java after a long time, guys.. So, give me a little > more time to ramp up to '10 :)
Welcome back! > > One more question then: What are the implications of running really large > regions (like around 4-8 gigs per region)? One implication I can think of is > coarser grained control over load (since a split will happen less > frequently).. But with a large number of nodes, this isnt that coarse-grained > I guess? I don't know anybody who's that high up, here we run at 1GB on our table that has a few TBs. But yeah, at scale that stuff won't matter as much, but with 8GB you could blow out your memory. > > We are trying to load 100's of terabytes eventually.. And running even 100s > of regions per RS seems like a big hit on the memory. >From what I saw in your metrics dump, storing the actual regions was costing you almost nothing. But, you will encounter problems when the global size of all memstores is getting very big (a true random write pattern will always get you there). IMO, the biggest issue with the number of regions served per RS is more about the actual data that is stored and retrieved WRT the performance each of your node can deliver (capacity planning). J-D