Shawn, thank you, your help is very much appreciated, We had already changed SO configuration before the last crash, so I think that the problem is not there.
ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 257683 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 65535 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 65535 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited When Solr tries to delete a znode? I´am sorry, because I understand nothing about this process, and it is the only point that seems suspicios for me. Do you think that it can cause inconsistency leading to the OOM problem? > Just this message bellow, can you help me to understand what does this > > message means? > > > > 2019-12-12 10:00:23,662 [myid:] - INFO [ProcessThread(sid:0 > > cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException > > when processing sessionid:0x1000071b8ec4adb type:delete cxid:0x10 > > zxid:0xafc6 txntype:-1 reqpath:n/a Error > > > Path:/overseer_elect/election/72058082471721304-192.168.0.61:8983_solr-n_0000000018 > > Error:KeeperErrorCode = NoNode for > > > /overseer_elect/election/72058082471721304-192.168.0.61:8983_solr-n_0000000018 > > Solr tried to delete a znode from zookeeper and that deletion failed > because the znode did not exist. > > I can't offer much about WHY it didn't exist, but my best guess is that > it would have been created by the thread that Solr could not start. > > Just after this INFO message above, ZK log starts to log thousands of this block of lines below. Where it seems that ZK creates and closes thousands of sessions. """ 2019-12-12 10:00:58,591 [myid:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@948] - Client attempting to establish new session at /192.168.0.31:49351 2019-12-12 10:01:48,038 [myid:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /192.168.0.31:50118 2019-12-12 10:09:03,370 [myid:] - INFO [SyncThread:0:ZooKeeperServer@693] - Established session 0x1000071b8ec5013 with negotiated timeout 15000 for client /192.168.0.31:52474 2019-12-12 10:09:45,631 [myid:] - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1000071b8ec5013, likely client has closed socket 2019-12-12 10:09:45,631 [myid:] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /192.168.0.31:52474 which had sessionid 0x1000071b8ec5013 2019-12-12 10:09:58,473 [myid:] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1000071b8ec5013, timeout of 15000ms exceeded 2019-12-12 10:09:58,473 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1000071b8ec5013 """ Again, I really dont know the integration about ZK, and Solr and I am trying to follow the logs to get the problem. My application is Python and as far as I inspected it is not the origin or the problem. Thank you, Koji
