Hi guys it seems every couple of weeks we lose a node... Here are the logs: https://www.dropbox.com/sh/8cv2v8q5lcsju53/AAAU6ZSFkfiZPaMwHgIh5GAfa?dl=0
And some extra details. Maybe I need to do more tuning then what is already mentioned below, maybe set a higher timeout? 3 server nodes and 9 clients (client = true) Performance wise the cluster is not doing any kind of high volume on average it does about 15-20 puts/gets/queries (any combination of) per 30-60 seconds. The biggest cache we have is: 3 million records distributed with 1 backup using the following template. <bean id="cache-template-bean" abstract="true" class="org.apache.ignite.configuration.CacheConfiguration"> <!-- when you create a template via XML configuration, you must add an asterisk to the name of the template --> <property name="name" value="partitionedTpl*"/> <property name="cacheMode" value="PARTITIONED" /> <property name="backups" value="1" /> <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/> </bean> Persistence is configured: <property name="dataStorageConfiguration"> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> <!-- Redefining the default region's settings --> <property name="defaultDataRegionConfiguration"> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> <property name="persistenceEnabled" value="true"/> <property name="name" value="Default_Region"/> <property name="maxSize" value="#{10L * 1024 * 1024 * 1024}"/> </bean> </property> </bean> </property> We also followed the tuning instructions for GC and I/O if [ -z "$JVM_OPTS" ] ; then JVM_OPTS="-Xms6g -Xmx6g -server -XX:MaxMetaspaceSize=256m" fi # # Uncomment the following GC settings if you see spikes in your throughput due to Garbage Collection. # JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC" sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm .dirty_expire_centisecs=500