Re: Hbase unstable, master and regionservers crashes every night at ~ 00:07 UTC

Andrew Purtell Tue, 28 Oct 2014 10:35:35 -0700

The reason Ted mentioned ZooKeeper is it looks like you are experiencing
network partitions on a regular basis just long enough for ZooKeeper
clients to time out. Given the regularity of the occurrence, I think you
need to look for an external (non-software) cause. What else is going on at
midnight UTC? Cron job? Switch/network reload? Are you collecting host and
network metrics? Anything else observed during this time?




On Tue, Oct 28, 2014 at 7:15 AM, Ron van der Vegt <
[email protected]> wrote:

> Thanks for the fast reply!
>
> Well, all the three zookeeper processes where running. But I guess not
> properly, based on the logs. The problem is that I cannot figure out why
> this is happening. Are you asking because you think that zookeeper is
> triggering all the problems?
>
> Thanks in advice,
>
> Ron
>
> On di, okt 28, 2014 at 2:51 , Ted Yu <[email protected]> wrote:
>
>> 2014-10-27 23:06:36,844 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/
>> 0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0
>> due to java.io.IOException: ZooKeeperServer not running
>>
>> Were zookeeper quorum running properly from 23:06:00 to 00:07:18 ?
>>
>> Cheers
>>
>>
>> On Tue, Oct 28, 2014 at 5:14 AM, Ron van der Vegt <
>> [email protected]> wrote:
>>
>>   My setup is the following:
>>>
>>>  server1:
>>>  - master
>>>  - zookeeper
>>>
>>>  server2:
>>>  - zookeeper
>>>
>>>  server3:
>>>  - zookeeper
>>>
>>>  server 4 t/m 8
>>>  - regionserver
>>>
>>>  HBase: hbase-0.98.7-hadoop1
>>>  Zookeeper: zookeeper-3.4.6
>>>  hadoop: 1.2.1
>>>
>>>  Every night at aprox 00:07 UTC the master and all the regionservers will
>>>  shutdown.
>>>  When I monitor server1 with munin, I don't see any extra load
>>>  (io,memory,cpu) at that time.
>>>  gc.log showed me that the garbage collection is also fast. See below the
>>>  compromised logs. I'm little bit out of options what is causing this.
>>> Does
>>>  anyone have a clue what is happening?
>>>
>>>  Thanks in advice,
>>>
>>>  Ron
>>>
>>>
>>>  Logs master server1
>>>  ===================
>>>
>>>  2014-10-27 23:06:36,503 INFO [master:vps2008:60000-SendThread(
>>>  141.105.120.103:2181)] zookeeper.ClientCnxn: Unable to read additional
>>>  data from server sessionid 0x1494ed899bd0211, likely server has closed
>>>  socket, closing socket connection and attempting reconnect
>>>  2014-10-27 23:06:37,035 INFO [master:vps2008:60000-SendThread(
>>>  141.105.120.102:2181)] zookeeper.ClientCnxn: Opening socket connection
>>> to
>>>  server 141.105.120.102/141.105.120.102:2181. Will not attempt to
>>>  authenticate using SASL (unknown error)
>>>  2014-10-27 23:06:37,036 INFO [master:vps2008:60000-SendThread(
>>>  141.105.120.102:2181)] zookeeper.ClientCnxn: Socket connection
>>>  established to 141.105.120.102/141.105.120.102:2181, initiating session
>>>  2014-10-27 23:06:37,037 INFO [master:vps2008:60000-SendThread(
>>>  141.105.120.102:2181)] zookeeper.ClientCnxn: Unable to read additional
>>>  data from server sessionid 0x1494ed899bd0211, likely server has closed
>>>  socket, closing socket connection and attempting reconnect
>>>  2014-10-27 23:06:37,583 INFO [master:vps2008:60000-SendThread(
>>>  141.105.121.127:2181)] zookeeper.ClientCnxn: Opening socket connection
>>> to
>>>  server 141.105.121.127/141.105.121.127:2181. Will not attempt to
>>>  authenticate using SASL (unknown error)
>>>
>>>  ......
>>>
>>>  2014-10-27 23:47:35,881 INFO [JvmPauseMonitor] util.JvmPauseMonitor:
>>>  Detected pause in JVM or host machine (eg GC): pause of approximately
>>> 1068ms
>>>  No GCs detected
>>>
>>>  5x probarly every regionserver:
>>>
>>>  2014-10-27 23:47:35,937 WARN [RpcServer.reader=9,port=60000]
>>>  ipc.RpcServer: RpcServer.listener,port=60000: count of bytes read: 0
>>>  java.io.IOException: Connection reset by peer
>>>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>>         at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
>>>  RpcServer.java:2244)
>>>         at org.apache.hadoop.hbase.ipc.RpcServer$Connection.
>>>  readAndProcess(RpcServer.java:1423)
>>>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
>>>  RpcServer.java:798)
>>>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
>>>  doRunLoop(RpcServer.java:589)
>>>         at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
>>>  RpcServer.java:564)
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>  ThreadPoolExecutor.java:1145)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>  ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>  .....
>>>
>>>  2014-10-28 00:01:23,798 WARN [master:vps2008:60000.oldLogCleaner]
>>>  zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=
>>>  141.105.120.103:2181,141.105.120.102:2181,141.105.121.127:2181,
>>>  exception=org.apache.z
>>>  ookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
>>>  Session expired for /hbase/replication/rs
>>>  2014-10-28 00:01:23,798 ERROR [master:vps2008:60000.oldLogCleaner]
>>>  zookeeper.RecoverableZooKeeper: ZooKeeper getChildren failed after 4
>>>  attempts
>>>  2014-10-28 00:01:23,799 WARN [master:vps2008:60000.oldLogCleaner]
>>>  master.ReplicationLogCleaner: Aborting ReplicationLogCleaner because
>>> Failed
>>>  to get list of replicators
>>>  org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>  KeeperErrorCode = Session expired for /hbase/replication/rs
>>>         at org.apache.zookeeper.KeeperException.create(
>>>  KeeperException.java:127)
>>>         at org.apache.zookeeper.KeeperException.create(
>>>  KeeperException.java:51)
>>>         at org.apache.zookeeper.ZooKeeper.getChildren(
>>> ZooKeeper.java:1472)
>>>         at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.
>>>  getChildren(RecoverableZooKeeper.java:296)
>>>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.
>>>  listChildrenNoWatch(ZKUtil.java:573)
>>>         at org.apache.hadoop.hbase.replication.ReplicationStateZKBase.
>>>  getListOfReplicators(ReplicationStateZKBase.java:79)
>>>         at org.apache.hadoop.hbase.replication.master.
>>>  ReplicationLogCleaner.loadHLogsFromQueues(
>>> ReplicationLogCleaner.java:88)
>>>         at org.apache.hadoop.hbase.replication.master.
>>>  ReplicationLogCleaner.getDeletableFiles(ReplicationLogCleaner.java:67)
>>>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.
>>>  checkAndDeleteFiles(CleanerChore.java:233)
>>>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.
>>>  checkAndDeleteEntries(CleanerChore.java:157)
>>>         at org.apache.hadoop.hbase.master.cleaner.CleanerChore.
>>>  chore(CleanerChore.java:124)
>>>         at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>  .....
>>>
>>>  2014-10-28 00:06:05,997 ERROR [MASTER_META_SERVER_
>>> OPERATIONS-vps2008:60000-0]
>>>  handler.MetaServerShutdownHandler: Caught M_META_SERVER_SHUTDOWN,
>>> count=1
>>>  java.io.IOException: failed log splitting for vps2060.directvps.nl
>>> ,60020,1414414327187,
>>>  will retry
>>>         at org.apache.hadoop.hbase.master.handler.
>>>  MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:84)
>>>         at org.apache.hadoop.hbase.executor.EventHandler.run(
>>>  EventHandler.java:128)
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>  ThreadPoolExecutor.java:1145)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>  ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>  Caused by: java.io.IOException: error or interrupted while splitting
>>> logs
>>>  in [hdfs://namenode.openindex.io:8020/hbase/WALs/vps2060.
>>>  directvps.nl,60020,1414414327187-splitting] Task = installed = 1 done
>>> = 0
>>>  error = 1
>>>         at org.apache.hadoop.hbase.master.SplitLogManager.
>>>  splitLogDistributed(SplitLogManager.java:359)
>>>         at org.apache.hadoop.hbase.master.MasterFileSystem.
>>>  splitLog(MasterFileSystem.java:416)
>>>         at org.apache.hadoop.hbase.master.MasterFileSystem.
>>>  splitMetaLog(MasterFileSystem.java:308)
>>>         at org.apache.hadoop.hbase.master.MasterFileSystem.
>>>  splitMetaLog(MasterFileSystem.java:299)
>>>         at org.apache.hadoop.hbase.master.handler.
>>>  MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:77)
>>>         ... 4 more
>>>  2014-10-28 00:06:05,997 INFO [MASTER_META_SERVER_
>>> OPERATIONS-vps2008:60000-1]
>>>  handler.MetaServerShutdownHandler: Splitting hbase:meta logs for
>>>  vps2060.directvps.nl,60020,1414414327187
>>>  2014-10-28 00:06:06,006 INFO [MASTER_META_SERVER_
>>> OPERATIONS-vps2008:60000-1]
>>>  master.SplitLogManager: dead splitlog workers [vps2060.directvps.nl
>>> ,60020,
>>>  1414414327187]
>>>  2014-10-28 00:06:06,021 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>  EventThread shut down
>>>  2014-10-28 00:06:06,021 INFO [master:vps2008:60000] zookeeper.ZooKeeper:
>>>  Session: 0x1495400a0370000 closed
>>>  2014-10-28 00:06:06,021 INFO [master:vps2008:60000] master.HMaster:
>>>  HMaster main thread exiting
>>>  2014-10-28 00:06:06,022 ERROR [main] master.HMasterCommandLine: Master
>>>  exiting
>>>  java.lang.RuntimeException: HMaster Aborted
>>>         at org.apache.hadoop.hbase.master.HMasterCommandLine.
>>> startMaster(
>>>  HMasterCommandLine.java:194)
>>>         at org.apache.hadoop.hbase.master.HMasterCommandLine.run(
>>>  HMasterCommandLine.java:135)
>>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>         at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(
>>>  ServerCommandLine.java:126)
>>>
>>>
>>>  Logs of zookeeper server1:
>>>  ==========================
>>>
>>>  2014-10-27 23:06:36,562 [myid:2] - WARN [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:Follower@89] - Exception when following the leader
>>>  java.net.SocketTimeoutException: Read timed out
>>>         at java.net.SocketInputStream.socketRead0(Native Method)
>>>         at java.net.SocketInputStream.read(SocketInputStream.java:152)
>>>         at java.net.SocketInputStream.read(SocketInputStream.java:122)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:
>>> 235)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:
>>> 254)
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>         at org.apache.jute.BinaryInputArchive.readInt(
>>>  BinaryInputArchive.java:63)
>>>         at org.apache.zookeeper.server.quorum.QuorumPacket.
>>>  deserialize(QuorumPacket.java:83)
>>>         at org.apache.jute.BinaryInputArchive.readRecord(
>>>  BinaryInputArchive.java:103)
>>>         at org.apache.zookeeper.server.quorum.Learner.readPacket(
>>>  Learner.java:153)
>>>         at org.apache.zookeeper.server.quorum.Follower.followLeader(
>>>  Follower.java:85)
>>>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(
>>>  QuorumPeer.java:786)
>>>  2014-10-27 23:06:36,629 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:Follower@166] - shutdown called
>>>  java.lang.Exception: shutdown Follower
>>>         at org.apache.zookeeper.server.quorum.Follower.shutdown(
>>>  Follower.java:166)
>>>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(
>>>  QuorumPeer.java:790)
>>>  2014-10-27 23:06:36,630 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:FollowerZooKeeperServer@139] - Shutting down
>>>  2014-10-27 23:06:36,630 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:ZooKeeperServer@441] - shutting down
>>>  2014-10-27 23:06:36,630 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:FollowerRequestProcessor@105] - Shutting down
>>>  2014-10-27 23:06:36,631 [myid:2] - INFO [FollowerRequestProcessor:2:
>>>  FollowerRequestProcessor@95] - FollowerRequestProcessor exited loop!
>>>  2014-10-27 23:06:36,632 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:CommitProcessor@181] - Shutting down
>>>  2014-10-27 23:06:36,632 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:FinalRequestProcessor@415] - shutdown of request processor
>>>  complete
>>>  2014-10-27 23:06:36,632 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:SyncRequestProcessor@209] - Shutting down
>>>  2014-10-27 23:06:36,633 [myid:2] - INFO [SyncThread:2:
>>>  SyncRequestProcessor@187] - SyncRequestProcessor exited!
>>>  2014-10-27 23:06:36,633 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:QuorumPeer@714] - LOOKING
>>>  2014-10-27 23:06:36,635 [myid:2] - INFO [CommitProcessor:2:
>>>  CommitProcessor@150] - CommitProcessor exited loop!
>>>  2014-10-27 23:06:36,716 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:
>>>  0:0:0:2181:FileSnap@83] - Reading snapshot
>>> /opt/zookeeper/data/version-2/
>>>  snapshot.ed00000df4
>>>  2014-10-27 23:06:36,774 [myid:2] - INFO [WorkerReceiver[myid=2]:
>>>  FastLeaderElection@597] - Notification: 1 (message format version), 1
>>>  (n.leader), 0xed00000e01 (n.zxid), 0xf9 (n.round), LOOKING (n.state), 1
>>>  (n.sid), 0xed (
>>>  n.peerEpoch) LOOKING (my state)
>>>  2014-10-27 23:06:36,816 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
>>> from /
>>>  141.105.120.150:47142
>>>  2014-10-27 23:06:36,844 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session
>>> 0x0
>>>  due to java.io.IOException: ZooKeeperServer not running
>>>  2014-10-27 23:06:36,845 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client
>>> /
>>>  141.105.120.150:47142 (no session established for client)
>>>
>>>  .....
>>>
>>>  2014-10-28 00:07:18,464 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new
>>>  session at /141.105.125.180:56530
>>>  2014-10-28 00:07:18,468 [myid:2] - INFO [CommitProcessor:2:
>>>  ZooKeeperServer@617] - Established session 0x24954020e470268 with
>>>  negotiated timeout 6000 for client /141.105.125.180:56530
>>>  2014-10-28 00:07:18,483 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client
>>> /
>>>  141.105.125.180:56530 which had sessionid 0x24954020e470268
>>>  2014-10-28 00:07:18,632 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
>>> from /
>>>  178.21.116.224:45958
>>>  2014-10-28 00:07:18,632 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client
>>> /
>>>  178.21.116.224:45958; will be dropped if server is in r-o mode
>>>  2014-10-28 00:07:18,632 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new
>>>  session at /178.21.116.224:45958
>>>  2014-10-28 00:07:18,636 [myid:2] - INFO [CommitProcessor:2:
>>>  ZooKeeperServer@617] - Established session 0x24954020e470269 with
>>>  negotiated timeout 6000 for client /178.21.116.224:45958
>>>  2014-10-28 00:07:18,650 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
>>>  EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>>  0x24954020e470269, likely client has closed socket
>>>         at org.apache.zookeeper.server.NIOServerCnxn.doIO(
>>>  NIOServerCnxn.java:228)
>>>         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(
>>>  NIOServerCnxnFactory.java:208)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>  2014-10-28 00:07:18,651 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/
>>>  0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client
>>> /
>>>  178.21.116.224:45958 which had sessionid 0x24954020e470269
>>>
>>>  Gc log server1 (date and time may differ, but same result is shown at
>>> time
>>>  of problem)
>>>  ============================================================
>>>  ==========================
>>>
>>>  2014-10-23T07:39:02.597+0000: 0.334: [GC2014-10-23T07:39:02.597+0000:
>>>  0.334: [ParNew: 4288K->512K(4800K), 0.0068790 secs]
>>> 4288K->1288K(15424K),
>>>  0.0069880 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
>>>  2014-10-23T07:39:02.676+0000: 0.413: [GC2014-10-23T07:39:02.676+0000:
>>>  0.413: [ParNew: 4800K->511K(4800K), 0.0136860 secs]
>>> 5576K->2274K(15424K),
>>>  0.0137620 secs] [Times: user=0.01 sys=0.01, real=0.01 secs]
>>>  2014-10-23T07:39:02.759+0000: 0.496: [GC2014-10-23T07:39:02.759+0000:
>>>  0.496: [ParNew: 4790K->511K(4800K), 0.0039330 secs]
>>> 6553K->2582K(15424K),
>>>  0.0040040 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
>>>  ...
>>>
>>>  2014-10-23T07:39:03.621+0000: 1.358: [GC2014-10-23T07:39:03.621+0000:
>>>  1.358: [ParNew: 4799K->512K(4800K), 0.0022680 secs]
>>> 9184K->5281K(15424K),
>>>  0.0023360 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
>>>  2014-10-23T07:39:03.803+0000: 1.540: [GC2014-10-23T07:39:03.803+0000:
>>>  1.540: [ParNew: 4800K->512K(4800K), 0.0018700 secs]
>>> 9569K->5586K(15424K),
>>>  0.0019370 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:03.915+0000: 1.652: [GC2014-10-23T07:39:03.915+0000:
>>>  1.652: [ParNew: 4800K->512K(4800K), 0.0019160 secs]
>>> 9874K->6010K(15424K),
>>>  0.0019840 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:03.918+0000: 1.654: [GC [1 CMS-initial-mark:
>>>  5498K(10624K)] 6010K(15424K), 0.0041480 secs] [Times: user=0.00
>>> sys=0.00,
>>>  real=0.01 secs] 2014-10-23T07:39:03.922+0000: 1.659:
>>>  [CMS-concurrent-mark-start]
>>>  2014-10-23T07:39:03.964+0000: 1.701: [CMS-concurrent-mark: 0.038/0.042
>>>  secs] [Times: user=0.09 sys=0.00, real=0.04 secs]
>>>  2014-10-23T07:39:03.964+0000: 1.701: [CMS-concurrent-preclean-start]
>>>  2014-10-23T07:39:03.965+0000: 1.702: [CMS-concurrent-preclean:
>>> 0.001/0.001
>>>  secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:03.971+0000: 1.708: [GC[YG occupancy: 1582 K (4800
>>>  K)]2014-10-23T07:39:03.971+0000: 1.708: [Rescan (parallel) , 0.0013900
>>>  secs]2014-10-23T07:39:03.972+0000: 1.709: [weak refs processing,
>>>  0.0000240 secs]2014-10-23T07:39:03.973+0000: 1.709: [scrub string
>>> table,
>>>  0.0001890 secs] [1 CMS-remark: 5498K(10624K)] 7081K(15424K), 0.0016880
>>>  secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:03.973+0000: 1.710: [CMS-concurrent-sweep-start]
>>>  2014-10-23T07:39:03.974+0000: 1.711: [CMS-concurrent-sweep: 0.001/0.001
>>>  secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:03.974+0000: 1.711: [CMS-concurrent-reset-start]
>>>  2014-10-23T07:39:04.086+0000: 1.823: [GC2014-10-23T07:39:04.086+0000:
>>>  1.823: [ParNew: 4800K->512K(4800K), 0.0016370 secs]
>>> 7380K->3450K(15424K),
>>>  0.0017180 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:04.146+0000: 1.883: [CMS-concurrent-reset: 0.170/0.172
>>>  secs] [Times: user=0.11 sys=0.16, real=0.17 secs]
>>>  2014-10-23T07:39:04.194+0000: 1.931: [GC2014-10-23T07:39:04.194+0000:
>>>  1.931: [ParNew: 4799K->512K(4800K), 0.0015060 secs]
>>> 7737K->3610K(15424K),
>>>  0.0015850 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:04.347+0000: 2.084: [GC2014-10-23T07:39:04.347+0000:
>>>  2.084: [ParNew: 4800K->511K(4800K), 0.0019550 secs]
>>> 7898K->3849K(15424K),
>>>  0.0020250 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:39:04.430+0000: 2.167: [GC2014-10-23T07:39:04.431+0000:
>>>  2.167: [ParNew: 4799K->495K(4800K), 0.0021720 secs]
>>> 8137K->3981K(15424K),
>>>  0.0022710 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
>>>
>>>  ....
>>>
>>>  secs] 2014-10-23T07:40:05.961+0000: 63.698: [GC2014-10-23T07:40:05.961+
>>> 0000:
>>>  63.698: [ParNew: 4649K->511K(4800K), 0.0028340 secs]
>>> 13309K->9479K(15424K),
>>>  0.0029510 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:40:14.198+0000: 71.935: [GC2014-10-23T07:40:14.198+0000:
>>>  71.935: [ParNew: 4799K->511K(4800K), 0.0029840 secs]
>>> 13767K->9895K(15424K),
>>>  0.0030750 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:40:14.289+0000: 72.025: [GC2014-10-23T07:40:14.289+0000:
>>>  72.026: [ParNew: 4799K->512K(4800K), 0.0023430 secs]
>>>  14183K->10375K(15424K), 0.0024230 secs] [Times: user=0.01 sys=0.00,
>>>  real=0.01 secs] 2014-10-23T07:40:14.291+0000: 72.028: [GC [1
>>>  CMS-initial-mark: 9863K(10624K)] 10383K(15424K), 0.0009020 secs] [Times:
>>>  user=0.00 sys=0.00, real=0.00 secs] 2014-10-23T07:40:14.292+0000:
>>> 72.029:
>>>  [CMS-concurrent-mark-start]
>>>  2014-10-23T07:40:14.322+0000: 72.059: [CMS-concurrent-mark: 0.030/0.030
>>>  secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
>>>  2014-10-23T07:40:14.322+0000: 72.059: [CMS-concurrent-preclean-start]
>>>  2014-10-23T07:40:14.322+0000: 72.059: [CMS-concurrent-preclean:
>>>  0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:40:14.322+0000: 72.059: [CMS-concurrent-abortable-
>>>  preclean-start]
>>>  CMS: abort preclean due to time 2014-10-23T07:40:19.353+0000: 77.090:
>>>  [CMS-concurrent-abortable-preclean: 0.089/5.030 secs] [Times: user=0.11
>>>  sys=0.01, real=5.03 secs] 2014-10-23T07:40:19.353+0000: 77.090: [GC[YG
>>>  occupancy: 3651 K (4800 K)]2014-10-23T07:40:19.354+0000: 77.091:
>>> [Rescan
>>>  (parallel) , 0.0020850 secs]2014-10-23T07:40:19.356+0000: 77.093: [weak
>>>  refs processing, 0.0001570 secs]2014-10-23T07:40:19.356+0000: 77.093:
>>>  [scrub string table, 0.0004000 secs] [1 CMS-remark: 9863K(10624K)]
>>>  13514K(15424K), 0.0034860 secs] [Times: user=0.00 sys=0.00, real=0.00
>>> secs]
>>>  2014-10-23T07:40:19.357+0000: 77.094: [CMS-concurrent-sweep-start]
>>>  2014-10-23T07:40:19.360+0000: 77.097: [CMS-concurrent-sweep: 0.003/0.003
>>>  secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
>>>  2014-10-23T07:40:19.361+0000: 77.097: [CMS-concurrent-reset-start]
>>>  2014-10-23T07:40:19.362+0000: 77.099: [CMS-concurrent-reset: 0.001/0.001
>>>  secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
>>>  2014-10-23T07:41:56.874+0000: 174.611: [GC2014-10-23T07:41:56.875+0000:
>>>  174.611: [ParNew: 4800K->512K(4800K), 0.0021740 secs]
>>>  10761K->6828K(15424K), 0.0022740 secs] [Times: user=0.01 sys=0.00,
>>>  real=0.00 secs] 2014-10-23T07:47:21.160+0000: 498.897:
>>>  [GC2014-10-23T07:47:21.160+0000: 498.897: [ParNew: 4800K->286K(4800K),
>>>  0.0025590 secs] 11116K->7045K(15424K), 0.0026680 secs] [Times: user=0.00
>>>  sys=0.00, real=0.01 secs]
>>>  ...
>>>
>>>  2014-10-23T08:39:29.672+0000: 3627.409: [GC2014-10-23T08:39:29.672+
>>> 0000:
>>>  3627.409: [ParNew: 4705K->512K(4800K), 0.0045100 secs]
>>>  12027K->8045K(15424K), 0.0046410 secs] [Times: user=0.01 sys=0.00,
>>>  real=0.00 secs] 2014-10-23T08:45:08.685+0000: 3966.422:
>>>  [GC2014-10-23T08:45:08.686+0000: 3966.422: [ParNew: 4800K->478K(4800K),
>>>  0.0035990 secs] 12333K->8609K(15424K), 0.0036800 secs] [Times: user=0.01
>>>  sys=0.00, real=0.00 secs] Heap
>>>  par new generation total 4800K, used 827K [0x00000000b7200000,
>>>  0x00000000b7730000, 0x00000000c1860000)
>>>   eden space 4288K, 8% used [0x00000000b7200000, 0x00000000b72573b0,
>>>  0x00000000b7630000)
>>>   from space 512K, 93% used [0x00000000b76b0000, 0x00000000b7727980,
>>>  0x00000000b7730000)
>>>   to space 512K, 0% used [0x00000000b7630000, 0x00000000b7630000,
>>>  0x00000000b76b0000)
>>>  concurrent mark-sweep generation total 10624K, used 8130K
>>>  [0x00000000c1860000, 0x00000000c22c0000, 0x00000000f5a00000)
>>>  concurrent-mark-sweep perm gen total 50636K, used 30752K
>>>  [0x00000000f5a00000, 0x00000000f8b73000, 0x0000000100000000)
>>>
>>>
>>>
>>>
>>>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Hbase unstable, master and regionservers crashes every night at ~ 00:07 UTC

Reply via email to