Me too, Abhishek -- you are not alone. But it is good to learn and discuss here to know various design choices.
regards, Lin On Fri, Aug 24, 2012 at 1:06 AM, Pamecha, Abhishek <apame...@x.com> wrote: > I too thought there are multiple meta regions where as just one ROOT. May > be I am mixing b/w Big Table and Hbase. > > Thanks, > Abhishek > > > -----Original Message----- > From: Lin Ma [mailto:lin...@gmail.com] > Sent: Thursday, August 23, 2012 9:41 AM > To: user@hbase.apache.org; ha...@cloudera.com > Cc: doug.m...@explorysmedical.com > Subject: Re: how client location a region/tablet? > > Thanks, Harsh! > > - "HBase currently keeps a single META region (Doesn't split it). " -- > does it mean there is only one row in ROOT table, which points the only one > META region? > - In Big Table, it seems they have multiple META regions (tablets), is it > an advantage over HBase? :-) > > regards, > Lin > On Thu, Aug 23, 2012 at 11:48 PM, Harsh J <ha...@cloudera.com> wrote: > > > HBase currently keeps a single META region (Doesn't split it). ROOT > > holds META region location, and META has a few rows in it, a few of > > them for each table. See also the class MetaScanner. > > > > On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma <lin...@gmail.com> wrote: > > > Dong, > > > > > > Some more thoughts, after reading data structure for HRegionInfo => > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo. > > > html > > , > > > start key and end key looks informative which we could leverage, > > > > > > - I am not sure if we could leverage this information (stored as > > > part of value in table ROOT) to find which META region may contains > > > region server information for row-key 123 of data table ABC; > > > - But I think unfortunately the information is stored in value of > > > table ROOT, other than key field of table ROOT, so that we have to > > > iterate each row in ROOT table one by one to figure out which META > > > region server to access. > > > > > > Not sure if I get the points. Please feel free to correct me. > > > > > > regards, > > > Lin > > > > > > On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma <lin...@gmail.com> wrote: > > > > > >> Doug, very informative document. Thanks a lot! > > >> > > >> I read through it and have some thoughts, > > >> > > >> - Supposing at the beginning, client side cache for region > > >> information > > is > > >> empty, and the client wants to GET row-key 123 from table ABC; > > >> - The client will read from ROOT table at first. But unfortunately, > > >> ROOT table only contains region information for META table (please > > >> correct > > me if > > >> I am wrong), but not region information for real data table (e.g. > > >> table ABC); > > >> - Does the client have to call each META region server one by one, > > >> in order to find which META region contains information for region > > >> owner of row-key 123 of data table ABC? > > >> > > >> BTW: I think if there is a way to expose information about what > > >> range of table/region each META region contains from .META. region > > >> key, it will > > be > > >> better to save time to iterate META region server one by one. > > >> Please > > feel > > >> free to correct me if I am wrong. > > >> > > >> regards, > > >> Lin > > >> > > >> > > >> On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil < > > doug.m...@explorysmedical.com>wrote: > > >> > > >>> > > >>> For further information about the catalog tables and > > region-regionserver > > >>> assignment, see thisŠ > > >>> > > >>> http://hbase.apache.org/book.html#arch.catalog > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> On 8/19/12 7:36 AM, "Lin Ma" <lin...@gmail.com> wrote: > > >>> > > >>> >Thank you Stack, especially for the smart 6 round trip guess for > > >>> >the puzzle. :-) > > >>> > > > >>> >1. "Yeah, we client cache's locations, not the data." -- does it > > >>> >mean > > for > > >>> >each client, it will cache all location information of a HBase > > cluster, > > >>> >i.e. which physical server owns which region? Supposing each > > >>> >region > > has > > >>> >128M bytes, for a big cluster (P-bytes level), total data size / > > >>> >128M > > is > > >>> >not a trivial number, not sure if any overhead to client? > > >>> >2. A bit confused by what do you mean "not the data"? For the > > >>> >client cached location information, it should be the data in > > >>> >table METADATA, which > > is > > >>> >region / physical server mapping data. Why you say not data (do > > >>> >you > > mean > > >>> >real content in each region)? > > >>> > > > >>> >regards, > > >>> >Lin > > >>> > > > >>> >On Sun, Aug 19, 2012 at 12:40 PM, Stack <st...@duboce.net> wrote: > > >>> > > > >>> >> On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma <lin...@gmail.com> wrote: > > >>> >> > Hello guys, > > >>> >> > > > >>> >> > I am referencing the Big Table paper about how a client > > >>> >> > locates a > > >>> >>tablet. > > >>> >> > In section 5.1 Tablet location, it is mentioned that client > > >>> >> > will > > >>> cache > > >>> >> all > > >>> >> > tablet locations, I think it means client will cache root > > >>> >> > tablet > > in > > >>> >> > METADATA table, and all other tablets in METADATA table > > >>> >> > (which > > means > > >>> >> client > > >>> >> > cache the whole METADATA table?). My question is, whether > > >>> >> > HBase > > >>> >> implements > > >>> >> > in the same or similar way? My concern or confusion is, > > >>> >> > supposing > > >>> each > > >>> >> > tablet or region file is 128M bytes, it will be very huge > > >>> >> > space > > (i.e. > > >>> >> > memory footprint) for each client to cache all tablets or > > >>> >> > region > > >>> >>files of > > >>> >> > METADATA table. Is it doable or feasible in real HBase clusters? > > >>> >>Thanks. > > >>> >> > > > >>> >> > > >>> >> Yeah, we client cache's locations, not the data. > > >>> >> > > >>> >> > > >>> >> > BTW: another confusion from me is in the paper of Big Table > > section > > >>> >>5.1 > > >>> >> > Tablet location, it is mentioned that "If the client¹s cache > > >>> >> > is > > >>> stale, > > >>> >> the > > >>> >> > location algorithm could take up to six round-trips, because > > >>> >> > stale > > >>> >>cache > > >>> >> > entries are only discovered upon misses (assuming that > > >>> >> > METADATA > > >>> >>tablets > > >>> >> do > > >>> >> > not move very frequently).", I do not know how the 6 times > > >>> >> > round > > trip > > >>> >> time > > >>> >> > is calculated, if anyone could answer this puzzle, it will be > > great. > > >>> >>:-) > > >>> >> > > > >>> >> > > >>> >> I'm not sure what the 6 is about either. Here is a guesstimate: > > >>> >> > > >>> >> 1. Go to cached location for a server for a particular user > > >>> >> region, but server says that it does not have a region, the > > >>> >> client location > > is > > >>> >> stale > > >>> >> 2. Go back to client cached meta region that holds user region > > >>> >> w/ > > row > > >>> >> we want, but its location is stale. > > >>> >> 3. Go to root location, to find new location of meta, but the > > >>> >> root location has moved.... what the client has is stale 4. > > >>> >> Find new root location and do lookup of meta region location 5. > > >>> >> Go to meta region location to find new user region 6. Go to > > >>> >> server w/ user region > > >>> >> > > >>> >> St.Ack > > >>> >> > > >>> > > >>> > > >>> > > >> > > > > > > > > -- > > Harsh J > > >