I'll give that tweaking a try.  It's hard to do a thread dump just when it
freezes, do you think there is harm in doing a thread dump every 10 seconds
or something?

I tried a new setup with more nodes to test out how that affects this
problem (from 2 to 4).  I saw less datastreaming errors, but it appears that
the client disconnects pretty frequently.  The client is running on the same
machine as the server it is connecting to, so i dont see a network issue
being real).  The time seems to overlap with checkpoints starting.  Here is
the log from the client:

2018-02-21 03:14:19,720 [ERROR] from
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi in
tcp-client-disco-sock-writer-#2%cabooseGrid% - Failed to send message: null
java.io.IOException: Failed to get acknowledge for message:
TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage
[sndNodeId=null, id=7a5d3c5b161-6b443ecd-f658-4782-8f8e-3ed6d6407fc1,
verifierNodeId=null, topVer=0, pendin
gIdx=0, failedNodes=null, isClient=true]]
        at
org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1276)
        at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
2018-02-21 03:14:19,720 [ERROR] from
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi in
tcp-client-disco-sock-reader-#3%cabooseGrid% - Failed to read message
[sock=Socket[addr=/127.0.0.1,port=47500,localport=59446],
locNodeId=6b443ecd-f658-4782-8f8e-3ed6d6407fc1,
rmtNodeId=93be80c3-c2b5-498b-a897-265c8bacb648]
org.apache.ignite.IgniteCheckedException: Failed to deserialize object with
given class loader: sun.misc.Launcher$AppClassLoader@28d93b30
        at
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:129)
        at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
        at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:9740)
        at
org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketReader.body(ClientImpl.java:1001)
        at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.net.SocketException: Socket closed
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:170)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at
org.apache.ignite.marshaller.jdk.JdkMarshallerInputStreamWrapper.read(JdkMarshallerInputStreamWrapper.java:53)
        at
java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2338)
        at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2351)
        at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
        at
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
        at
org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.<init>(JdkMarshallerObjectInputStream.java:39)
        at
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:119)
        ... 4 common frames omitted
2018-02-21 03:14:19,720 [ERROR] from
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi in
tcp-client-disco-sock-reader-#3%cabooseGrid% - Connection failed
[sock=Socket[addr=/127.0.0.1,port=47500,localport=59446],
locNodeId=6b443ecd-f658-4782-8f8e-3ed6d6407fc1]
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:170)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at
org.apache.ignite.marshaller.jdk.JdkMarshallerInputStreamWrapper.read(JdkMarshallerInputStreamWrapper.java:53)
        at
java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2338)
        at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2351)
        at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
        at
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
        at
org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.<init>(JdkMarshallerObjectInputStream.java:39)
        at
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:119)
        at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
        at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:9740)
        at
org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketReader.body(ClientImpl.java:1001)
        at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)


On the server:

[03:13:21,146][INFO][db-checkpoint-thread-#77][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=e2648942-961a-45c4-a7f9-41129e76b70f,
startPtr=FileWALPointer [idx=2131, fileOffset=1035404, len=60889,
forceFlush=true], checkpointLockWait=0ms, checkpointLockHoldTime=362ms,
pages=801041, reason='timeout']
[03:13:42,179][INFO][grid-timeout-worker-#47][IgniteKernal] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=93be80c3, uptime=02:46:00.162]
    ^-- H/N/C [hosts=4, nodes=5, CPUs=32]
    ^-- CPU [cur=0.13%, avg=16.01%, GC=0%]
    ^-- PageMemory [pages=1279170]
    ^-- Heap [used=6044MB, free=50.81%, comm=12288MB]
    ^-- Non heap [used=63MB, free=95.82%, comm=68MB]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=6, qSize=0]
    ^-- Outbound messages queue [size=0]
[03:14:36,112][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/127.0.0.1, rmtPort=44204]
[03:14:36,119][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/127.0.0.1, rmtPort=44204]
[03:14:36,119][INFO][tcp-disco-sock-reader-#18][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/127.0.0.1:59446, rmtPort=59446
[03:14:36,119][INFO][tcp-disco-sock-reader-#20][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/127.0.0.1:44204, rmtPort=44204]
[03:14:36,596][INFO][db-checkpoint-thread-#77][GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=e2648942-961a-45c4-a7f9-41129e76b70f,
pages=801041, markPos=FileWALPointer [idx=2131, fileOffset=1035404,
len=60889, forceFlush=true], walSegmentsCleared=50, markDuration=913ms,
pagesWrite=3798ms, fsync=71652ms, total=76363ms]



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to