Not necessarily consecutive, unless the request itself is so. It only returns 500 rows that match the user's request.
User's request of a specific row-range and filters are usually embedded into the Scan object, sent to the RS. Whatever is accumulated as the result of the Scan operation (server-side) is accumulated in sizes of 500 rows and returned in one Scanner.next() call from the client. Does this clear it up Lin? On Mon, Aug 27, 2012 at 8:40 PM, Lin Ma <lin...@gmail.com> wrote: > Hi Harsh, > > I read through the document you referred, for the below comment, I am > confused. Major confusion is, does it mean HBase will transfer consecutive > 500 rows to client (supposing client mapper want row with row-key 100, Hbase > will return row-key from 100 to 600 at one time to client, similar to batch > read?), how to ensure such 500 rows are all desired input for client mapper > job (e.g. how do HBase know client mapper job wants row-key from 101 to > 600)? > > "Using the default value means that the map-task will make call back to the > region-server for every record processed. Setting this value to 500, for > example, will transfer 500 rows at a time to the client to be processed." > > regards, > Lin > > > On Thu, Aug 23, 2012 at 11:37 PM, Harsh J <ha...@cloudera.com> wrote: >> >> Hi Lin, >> >> On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma <lin...@gmail.com> wrote: >> > Harsh, thanks for the detailed information. >> > >> > Two more comments, >> > >> > 1. I want to confirm my understanding is correct. At the beginning >> > client >> > cache has nothing, when it issue request for a table, if the region >> > server >> > location is not known, it will request from root META region to get >> > region >> > server information step by step, then cache the region server >> > information. >> > If cache already contain the requested region information, it will use >> > directly from cache. In this way, cache grows when cache miss for >> > requested >> > region information; >> >> You have it correct now. Region locations are cached only if they are >> not available. And they are cached on need-basis, not all at once. >> >> > 2. "far outweighs the other items it caches (scan results, etc.)", you >> > mean >> > GET API of HBase cache results? Sorry I am not aware of this feature >> > before. >> > How the results are cached, and whether we can control it (supposing a >> > client is doing random read pattern, we do not want to cache information >> > since each read may be unique row-key access)? Appreciate if you could >> > point >> > me to some more detailed information. >> >> Am speaking of Scanner value caching, not Gets exactly. See more about >> Scanner (client) caching at >> http://hbase.apache.org/book.html#perf.hbase.client.caching >> >> > regards, >> > Lin >> > >> > >> > On Thu, Aug 23, 2012 at 9:35 PM, Harsh J <ha...@cloudera.com> wrote: >> >> >> >> Hi Lin, >> >> >> >> On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma <lin...@gmail.com> wrote: >> >> > Thank you Abhishek, >> >> > >> >> > Two more comments, >> >> > >> >> > -- "Client only caches information as needed for its queries and not >> >> > necessarily for 'all' region servers." -- how did client know which >> >> > region >> >> > server information is necessary to be cached in current HBase >> >> > implementation? >> >> >> >> What Abhishek meant here is that it caches only the needed table's >> >> rows from META. It also only caches the specific region required for >> >> the row you're looking up/operating on, AFAICT. >> >> >> >> > -- When the client loads region server information for the first >> >> > time? >> >> > Did >> >> > client persistent cache information at client side about region >> >> > server >> >> > information? >> >> >> >> The client loads up regionserver information for a table, when it is >> >> requested to perform an operation on that table (on a specific row or >> >> the whole). It does not immediately, upon initialization, cache the >> >> whole of META's contents. >> >> >> >> Your question makes sense though, that it does seem to be such that a >> >> client *may* use quite a bit of memory space in trying to cache the >> >> META entries locally, but practically we've not had this cause issues >> >> for users yet. The amount of memory cached for META far outweighs the >> >> other items it caches (scan results, etc.). At least I have not seen >> >> any reports of excessive client memory usage just due to region >> >> locations of tables being cached. >> >> >> >> I think there's more benefits storing/caching it than not doing so, >> >> and so far we've not needed the extra complexity of persisting the >> >> cache to a local or non-RAM storage than keeping it in memory. >> >> >> >> -- >> >> Harsh J >> > >> > >> >> >> >> -- >> Harsh J > > -- Harsh J