On Wed, May 20, 2015 at 1:46 AM, David chen <c77...@163.com> wrote: > Thanks Ted, > For scenario #1, can not see any clues in regionserver log file that > denotes "kill -9" command was executed. Meanwhile, i think when JVM > inspects regionserver process OOME, it will create a new thread to execute > "kill -9 %p", the new thread should not write regionserver log, so the > fact, there is not any clues in regionserver log, is normal. Right? > For scenario #2, dmesg also did not provide any clues. But some clues were > seen in /var/log/messages: > ...... > May 14 12:00:38 localhost kernel: Out of memory: Kill process 22827 (java) > score 497 or sacrifice child > May 14 12:00:38 localhost kernel: Killed process 22827, UID 483, (java) > total-vm:17569220kB, anon-rss:16296276kB, file-rss:240kB > ...... > The 22827 above is regionserver PID. > It looks like regionserver itself OOM(total-vm:17569220kB, > anon-rss:16296276kB, the max-heap-size set is 15G), so was killed. Right? >
Yes. > But hbase has no heavy load in the cluster, Doesn't matter. You allocated it a heap of 15G. The OS is looking for memory and is at a extreme (swapping totally disabled?) so it starts killing random processes. This is not an hbase issue. It is an oversubscription problem. Google how to address. > so i don't think it was killed because of itself OOME, instead i think > because of lack of memory for other applications, so OS kill regionserver > to run more applications. > I currently has no evidence to prove my idea, so hope more helps. Thanks. > You quote all necessary evidence above. St.Ack > > > > > > > > At 2015-05-20 10:04:19, "Ted Yu" <yuzhih...@gmail.com> wrote: > >For scenario #1, you would see in the regionserver.out file that "kill -9 > " > >command was applied due to OOME. > > > >For scenario #2, can you see if dmesg provides some clue ? > > > >Cheers > > > >On Tue, May 19, 2015 at 6:32 PM, David chen <c77...@163.com> wrote: > > > >> Thanks for guys reply, its indeed helped me. > >> Another question, I think there are two possibilities to kill > RegionServer > >> process: > >> 1. When JVM inspects that the memory, RegionServer has occupied, exceed > >> the max-heap-size, then JVM calls positively the command configured by > >> option "-XX:OnOutOfMemoryError=kill -9 %p" to kill RegionServer > process. > >> 2. RegionServer process does not reach the max-heap-size, but new > >> application need to allocation memory, if lack of memory, OS will > choose > >> to kill some processes, RegionServer unfortunately becomes the first > >> choice, so it is killed by OS. > >> Is my understanding right? If so, how to know which possibility my scene > >> is? > >> Any ideas can be appreciated! > >> >