Thanks Ted and Stack,
@Ted, i also reproduced the OOM for scenario #1, and found the hints in the
out file(it is
/var/run/cloudera-scm-agent/process/*-hbase-REGIONSERVER/logs/stdout.log in
CDH).
@Stack, it is not indeed hbase issue, just because other application occupied
more memory, then
Thanks Ted,
Sorry, the log extension is .log.out, so i think the out file you said is log
file.
My version is HBase 0.98.6-cdh5.2.0, where is regionserver.out file?
BTW, i should assure that my scenario is #2, so expect to get your snippet from
.out file
For scenario #1, please check regionserver.out file - not log file.
I was able to reproduce scenario #1 by giving regionserver 124mb heap. As soon
as I put load on the server, server was killed by kill -9 command.
I can send you snippet from .out file in the morning.
Cheers
On May 20,
Thanks Ted,
For scenario #1, can not see any clues in regionserver log file that denotes
kill -9 command was executed. Meanwhile, i think when JVM inspects
regionserver process OOME, it will create a new thread to execute kill -9 %p,
the new thread should not write regionserver log, so the
On Wed, May 20, 2015 at 1:46 AM, David chen c77...@163.com wrote:
Thanks Ted,
For scenario #1, can not see any clues in regionserver log file that
denotes kill -9 command was executed. Meanwhile, i think when JVM
inspects regionserver process OOME, it will create a new thread to execute
kill
Thanks for guys reply, its indeed helped me.
Another question, I think there are two possibilities to kill RegionServer
process:
1. When JVM inspects that the memory, RegionServer has occupied, exceed the
max-heap-size, then JVM calls positively the command configured by option
For scenario #1, you would see in the regionserver.out file that kill -9
command was applied due to OOME.
For scenario #2, can you see if dmesg provides some clue ?
Cheers
On Tue, May 19, 2015 at 6:32 PM, David chen c77...@163.com wrote:
Thanks for guys reply, its indeed helped me.
Another
On Mon, May 18, 2015 at 11:47 AM, Andrew Purtell apurt...@apache.org
wrote:
You need to not overcommit memory on servers running JVMs for HDFS and
HBase (and YARN, and containers, if colocating Hadoop MR). Sum the -Xmx
parameter, the maximum heap size, for all JVMs that will be concurrently
You need to not overcommit memory on servers running JVMs for HDFS and
HBase (and YARN, and containers, if colocating Hadoop MR). Sum the -Xmx
parameter, the maximum heap size, for all JVMs that will be concurrently
executing on the server. The total should be less than the total amount of
RAM
The snippet in /var/log/messages is as follows, i am sure that process
killed(22827) is RegsionServer.
..
May 14 12:00:38 localhost kernel: Mem-Info:
May 14 12:00:38 localhost kernel: Node 0 DMA per-cpu:
May 14 12:00:38 localhost kernel: CPU0: hi:0, btch: 1 usd: 0
..
May 14
Hi Ted,
I read the code snippet, you provided HRegionServer#Scan, in 0.98.5 version, it
looks like that the partial row is returned.
If so, the partial row has been fixed in 0.98.5 version, why the fix version is
1.1.0 in HBASE-11544?
At 2015-05-14 01:04:35, Ted Yu yuzhih...@gmail.com wrote:
I got '502 Bad Gateway' trying to access the post David mentioned.
Here is the same article in case you get 502 error:
http://java.dzone.com/articles/OOM-relation-to-swappiness
FYI
On Thu, May 14, 2015 at 2:40 AM, David chen c77...@163.com wrote:
Thanks for guys' helps.
Maybe the root reason
What log is this seen in? Can you paste the log line? Do you mean
/var/log/messages?
On May 12, 2015 7:44 PM, David chen c77...@163.com wrote:
A RegionServer was killed because OutOfMemory(OOM), although the process
killed can be seen in the Linux message log, but i still have two following
I should have mentioned in previous email that I was looking at code in
branch-1
bq. why the fix version is 1.1.0 in HBASE-11544?
See release note:
Incompatible Change: The return type of InternalScanners#next and
RegionScanners#nextRaw has been changed to NextState from boolean
Cheers
On Fri,
Thanks for guys' helps.
Maybe the root reason is to turn off swap.
The cluster contains seven Region servers, although all set vm.swappiness to 0,
but two of them has always turned off swap, others turned on. Meanwhile OOM
also always encountered on the two machines.
I plan to turn on swap and
On Wed, May 13, 2015 at 12:59 AM, David chen c77...@163.com wrote:
-XX:MaxGCPauseMillis=6000
With this line you're basically telling java to never garbage collect. Can
you try lowering that to something closer to the jvm default and see if you
have better stability?
For #2, partial row would be returned.
Please take a look at the following method in RSRpcServices around line
2393 :
public ScanResponse scan(final RpcController controller, final
ScanRequest request)
Cheers
On Wed, May 13, 2015 at 12:59 AM, David chen c77...@163.com wrote:
Thanks for you
On Tue, May 12, 2015 at 7:41 PM, David chen c77...@163.com wrote:
A RegionServer was killed because OutOfMemory(OOM), although the process
killed can be seen in the Linux message log, but i still have two following
problems:
1. How to inspect the root reason to cause OOM?
Start the
After moving to the G1GC we were plagued with random OOMs from time to
time. We always thought it was due to people requesting a big row or group
of rows, but upon investigation noticed that the heap dumps were many GBs
less than the max heap at time of OOM. If you have this symptom, you may
be
Thanks for you reply.
Yes, it indeed appeared in the RegionServer command as follows:
jps -v|grep Region
HRegionServer -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
-Djava.net.preferIPv4Stack=true -Xms16106127360 -Xmx16106127360 -XX:+UseG1GC
-XX:MaxGCPauseMillis=6000
A RegionServer was killed because OutOfMemory(OOM), although the process
killed can be seen in the Linux message log, but i still have two following
problems:
1. How to inspect the root reason to cause OOM?
2 When RegionServer encounters OOM, why can't it free some memories occupied?
if so,
Does the following appear in the command which launched region server ?
-XX:OnOutOfMemoryError=kill -9 %p
There could be multiple reasons for region server process to encounter OOME.
Please take a look at HBASE-11544 which fixes a common cause. The fix is in
the upcoming 1.1.0 release.
Cheers
22 matches
Mail list logo