bq. RegionStates: THIS SHOULD NOT HAPPEN: unexpected { ad283942aff2bba6c0b94ff98a904d1a state=SPLITTING_NEW
Looks like the above wouldn't have happened if you are using 0.98.11+ See HBASE-12958 On Wed, Feb 24, 2016 at 6:39 PM, Heng Chen <heng.chen.1...@gmail.com> wrote: > I pick up some logs in master.log about one region > "ad283942aff2bba6c0b94ff98a904d1a" > > > 2016-02-24 16:24:35,610 INFO [AM.ZK.Worker-pool2-t3491] > master.RegionStates: Transition null to {ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068} > 2016-02-24 16:25:40,472 WARN > [MASTER_SERVER_OPERATIONS-dx-common-hmaster1-online:60000-0] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {ad283942aff2bba6c0b94ff98a904d1a state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068} > 2016-02-24 16:34:24,769 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 16:39:24,768 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 16:44:24,768 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 16:45:37,749 DEBUG [FifoRpcScheduler.handler1-thread-10] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 16:49:24,769 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 16:54:24,768 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 16:59:24,768 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 17:04:24,769 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > 2016-02-24 17:09:24,768 DEBUG > [dx-common-hmaster1-online,60000,1433937470611-BalancerChore] > master.HMaster: Not running balancer because 2 region(s) in transition: > {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a > state=SPLITTING_NEW, ts=1456302275610, > server=dx-common-regionserver1-online,60020,1456302268068}, > ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef > state=SPLITTING_NEW... > > > > > > 2016-02-25 10:05 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > > bq. two regions were in transition > > > > Can you pastebin related server logs w.r.t. these two regions so that we > > can have more clue ? > > > > For #2, please see http://hbase.apache.org/book.html#big.cluster.config > > > > For #3, please see > > > > > http://hbase.apache.org/book.html#_running_multiple_workloads_on_a_single_cluster > > > > On Wed, Feb 24, 2016 at 3:31 PM, Heng Chen <heng.chen.1...@gmail.com> > > wrote: > > > > > The story is I run one MR job on my production cluster (0.98.6), it > > needs > > > to scan one table during map procedure. > > > > > > Because of the heavy load from the job, all my RS crashed due to OOM. > > > > > > After i restart all RS, i found one problem. > > > > > > All regions were reopened on one RS, and balancer could not run > because > > of > > > two regions were in transition. The cluster got in stuck a long time > > > until i restarted master. > > > > > > 1. why this happened? > > > > > > 2. If cluster has a lots of regions, after all RS crash, how to > restart > > > the cluster. If restart RS one by one, it means OOM may happen because > > one > > > RS has to hold all regions and it will cost a long time. > > > > > > 3. Is it possible to make each table with some requests quotas, it > > means > > > when one table is requested heavily, it has no impact to other tables > on > > > cluster. > > > > > > > > > Thanks > > > > > >