Re: client cache for all region server information?

Harsh J Mon, 27 Aug 2012 08:56:02 -0700

Not necessarily consecutive, unless the request itself is so. It only
returns 500 rows that match the user's request.


User's request of a specific row-range and filters are usually
embedded into the Scan object, sent to the RS. Whatever is accumulated
as the result of the Scan operation (server-side) is accumulated in
sizes of 500 rows and returned in one Scanner.next() call from the
client.

Does this clear it up Lin?

On Mon, Aug 27, 2012 at 8:40 PM, Lin Ma <lin...@gmail.com> wrote:
> Hi Harsh,
>
> I read through the document you referred, for the below comment, I am
> confused. Major confusion is, does it mean HBase will transfer consecutive
> 500 rows to client (supposing client mapper want row with row-key 100, Hbase
> will return row-key from 100 to 600 at one time to client, similar to batch
> read?), how to ensure such 500 rows are all desired input for client mapper
> job (e.g. how do HBase know client mapper job wants row-key from 101 to
> 600)?
>
> "Using the default value means that the map-task will make call back to the
> region-server for every record processed. Setting this value to 500, for
> example, will transfer 500 rows at a time to the client to be processed."
>
> regards,
> Lin
>
>
> On Thu, Aug 23, 2012 at 11:37 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Lin,
>>
>> On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma <lin...@gmail.com> wrote:
>> > Harsh, thanks for the detailed information.
>> >
>> > Two more comments,
>> >
>> > 1. I want to confirm my understanding is correct. At the beginning
>> > client
>> > cache has nothing, when it issue request for a table, if the region
>> > server
>> > location is not known, it will request from root META region to get
>> > region
>> > server information step by step, then cache the region server
>> > information.
>> > If cache already contain the requested region information, it will use
>> > directly from cache. In this way, cache grows when cache miss for
>> > requested
>> > region information;
>>
>> You have it correct now. Region locations are cached only if they are
>> not available. And they are cached on need-basis, not all at once.
>>
>> > 2. "far outweighs the other items it caches (scan results, etc.)", you
>> > mean
>> > GET API of HBase cache results? Sorry I am not aware of this feature
>> > before.
>> > How the results are cached, and whether we can control it (supposing a
>> > client is doing random read pattern, we do not want to cache information
>> > since each read may be unique row-key access)? Appreciate if you could
>> > point
>> > me to some more detailed information.
>>
>> Am speaking of Scanner value caching, not Gets exactly. See more about
>> Scanner (client) caching at
>> http://hbase.apache.org/book.html#perf.hbase.client.caching
>>
>> > regards,
>> > Lin
>> >
>> >
>> > On Thu, Aug 23, 2012 at 9:35 PM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> Hi Lin,
>> >>
>> >> On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma <lin...@gmail.com> wrote:
>> >> > Thank you Abhishek,
>> >> >
>> >> > Two more comments,
>> >> >
>> >> > -- "Client only caches information as needed for its queries and not
>> >> > necessarily for 'all' region servers." -- how did client know which
>> >> > region
>> >> > server information is necessary to be cached in current HBase
>> >> > implementation?
>> >>
>> >> What Abhishek meant here is that it caches only the needed table's
>> >> rows from META. It also only caches the specific region required for
>> >> the row you're looking up/operating on, AFAICT.
>> >>
>> >> > -- When the client loads region server information for the first
>> >> > time?
>> >> > Did
>> >> > client persistent cache information at client side about region
>> >> > server
>> >> > information?
>> >>
>> >> The client loads up regionserver information for a table, when it is
>> >> requested to perform an operation on that table (on a specific row or
>> >> the whole). It does not immediately, upon initialization, cache the
>> >> whole of META's contents.
>> >>
>> >> Your question makes sense though, that it does seem to be such that a
>> >> client *may* use quite a bit of memory space in trying to cache the
>> >> META entries locally, but practically we've not had this cause issues
>> >> for users yet. The amount of memory cached for META far outweighs the
>> >> other items it caches (scan results, etc.). At least I have not seen
>> >> any reports of excessive client memory usage just due to region
>> >> locations of tables being cached.
>> >>
>> >> I think there's more benefits storing/caching it than not doing so,
>> >> and so far we've not needed the extra complexity of persisting the
>> >> cache to a local or non-RAM storage than keeping it in memory.
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: client cache for all region server information?

Reply via email to