Hi, Re-post message 'cause I failed to post my logs pasted.
I have got repeated Too many open files exceptions since sometime. ================================ [11:26:24,493][SEVERE][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192], super=AbstractNioClientWorker [idx=1, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-1, igniteInstanceName=null, finished=false, hashCode=1611196193, interrupted=false, runner=grid-nio-worker-tcp-rest-1-#57]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/10.1.14.11:11211, rmtAddr=/10.1.252.184:40680, createTime=1529666783471, closeTime=0, bytesSent=5, bytesRcvd=1074, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1529666783481, lastSndTime=1529666783481, lastRcvTime=1529666783481, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [marsh=JdkMarshaller [clsFilter=o.a.i.i.IgniteKernal$5@331b0c4a], routerClient=false], directMode=false]], accepted=true]]] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1085) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) [11:26:24,493][WARNING][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Closing NIO session because of unhandled exception [cls=class o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer] [11:26:24,493][WARNING][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Closed client session due to exception [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192], super=AbstractNioClientWorker [idx=1, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-1, igniteInstanceName=null, finished=false, hashCode=1611196193, interrupted=false, runner=grid-nio-worker-tcp-rest-1-#57]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/10.1.14.11:11211, rmtAddr=/10.1.252.184:40680, createTime=1529666783471, closeTime=1529666784488, bytesSent=5, bytesRcvd=1074, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1529666783481, lastSndTime=1529666783481, lastRcvTime=1529666783481, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [marsh=JdkMarshaller [clsFilter=o.a.i.i.IgniteKernal$5@331b0c4a], routerClient=false], directMode=false]], accepted=true]], msg=Connection reset by peer] [11:26:24,513][SEVERE][grid-nio-worker-tcp-rest-1-#57][GridTcpRestProtocol] Caught unhandled exception in NIO worker thread (restart the node). java.lang.NullPointerException at sun.nio.ch.EPollArrayWrapper.isEventsHighKilled(EPollArrayWrapper.java:174) at sun.nio.ch.EPollArrayWrapper.setUpdateEvents(EPollArrayWrapper.java:190) at sun.nio.ch.EPollArrayWrapper.add(EPollArrayWrapper.java:239) at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:178) at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:132) at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:212) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2545) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1934) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) [11:26:30,277][SEVERE][nio-acceptor-#55][GridTcpRestProtocol] Failed to accept remote connection (will wait for 2000ms). class org.apache.ignite.IgniteCheckedException: Failed to accept connection: GridWorker [name=nio-acceptor, igniteInstanceName=null, finished=false, hashCode=1020662787, interrupted=false, runner=nio-acceptor-#55] at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.accept(GridNioServer.java:2888) at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.body(GridNioServer.java:2822) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.processSelectedKeys(GridNioServer.java:2938) at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.accept(GridNioServer.java:2872) ... 3 more [11:26:32,284][SEVERE][nio-acceptor-#55][GridTcpRestProtocol] Failed to accept remote connection (will wait for 2000ms). class org.apache.ignite.IgniteCheckedException: Failed to accept connection: GridWorker [name=nio-acceptor, igniteInstanceName=null, finished=false, hashCode=1020662787, interrupted=false, runner=nio-acceptor-#55] at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.accept(GridNioServer.java:2888) at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.body(GridNioServer.java:2822) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.processSelectedKeys(GridNioServer.java:2938) at org.apache.ignite.internal.util.nio.GridNioServer$GridNioAcceptWorker.accept(GridNioServer.java:2872) ... 3 more ================================ My max open files is 32768, and ignite process does have 32768 open files. ================================ $ sudo ls -hl /proc/4055/fd/ | wc -l 32768 ================================ Most of them look like this ================================ ... lrwx------ 1 root root 64 Jun 23 12:22 9990 -> socket:[1167798] lrwx------ 1 root root 64 Jun 23 12:22 9991 -> socket:[1167799] lrwx------ 1 root root 64 Jun 23 12:22 9992 -> socket:[1166839] lrwx------ 1 root root 64 Jun 23 12:22 9993 -> socket:[1167800] lrwx------ 1 root root 64 Jun 23 12:22 9994 -> socket:[1168762] lrwx------ 1 root root 64 Jun 23 12:22 9995 -> socket:[1168763] lrwx------ 1 root root 64 Jun 23 12:22 9996 -> socket:[1164109] lrwx------ 1 root root 64 Jun 23 12:22 9997 -> socket:[1166840] lrwx------ 1 root root 64 Jun 23 12:22 9998 -> socket:[1164110] lrwx------ 1 root root 64 Jun 23 12:22 9999 -> socket:[1169810] ================================ I haven't found any document about how ignite uses unix socket. It seems ignite doesn't close them properly. Any help? Thanks.