Got it. This must be the reason. Cause it is a laugh check, and i do see 6 regions for 40 rows so it can span them, although i can't confirm it for sure. It may be due to how table was set up or due to some time running them and rotating some data there. The uniformly distributed hashes are used for the keys so that it is totally plausible 40 rows will get into 6 different regions.
Ok i'll take it for working theory for now. Is there a way to set max # of regions per table? I guess the method in the manual is to set max region size. Which means i probably need to re-create the table with one region to get back to 1 region? or maybe there's a way to get it back to one region without recreating it, such as major compaction? thanks. -d On Wed, Apr 20, 2011 at 9:55 AM, Stack <st...@duboce.net> wrote: > On Wed, Apr 20, 2011 at 9:49 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >> Ok. Let me ask a question. >> >> When scan is performed and it obviously covers several regions, are >> scan performance calls done in sinchronous succession or they are done >> in parallel? >> > > The former. > > >> Assuming scan is returning 40 results but for some weird reason it >> goes to 6 regions and caching is set to 100 (so it can take all of >> them) are individual region request latencies summed or it would be >> max(region request latency)? >> > > Summed. > > The 40 rows are not contiguous in the same region? If not, the cost > of client setting up new scanner against next region will be inline w/ > your read timing (at least an rpc per region). > > St.Ack > >> Thank you very much. >> -D >> >> On Tue, Apr 19, 2011 at 6:28 PM, Ted Dunning <tdunn...@maprtech.com> wrote: >>> For a tiny test like this, everything should be in memory and latency >>> should be very low. >>> >>> On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: >>>> PS so what should latency be for reads in 0.90, assuming moderate thruput? >>>> >>>> On Tue, Apr 19, 2011 at 5:39 PM, Dmitriy Lyubimov <dlie...@gmail.com> >>>> wrote: >>>>> for this test, there's just no more than 40 rows in every given table. >>>>> This is just a laugh check. >>>>> >>>>> so i think it's safe to assume it all goes to same region server. >>>>> >>>>> But latency would not depend on which server call is going to, would >>>>> it? Only throughput would, assuming we are not overloading. >>>>> >>>>> And we clearly are not as my single-node local version runs quite ok >>>>> response times with the same throughput. >>>>> >>>>> It's something with either client connections or network latency or >>>>> ... i don't know what it is. I did not set up the cluster but i gotta >>>>> troubleshoot it now :) >>>>> >>>>> >>>>> >>>>> On Tue, Apr 19, 2011 at 5:23 PM, Ted Dunning <tdunn...@maprtech.com> >>>>> wrote: >>>>>> How many regions? How are they distributed? >>>>>> >>>>>> Typically it is good to fill the table some what and then drive some >>>>>> splits and balance operations via the shell. One more split to make >>>>>> the regions be local and you should be good to go. Make sure you have >>>>>> enough keys in the table to support these splits, of course. >>>>>> >>>>>> Under load, you can look at the hbase home page to see how >>>>>> transactions are spread around your cluster. Without splits and local >>>>>> region files, you aren't going to see what you want in terms of >>>>>> performance. >>>>>> >>>>> >>>> >>> >> >