Question on Ignite Persistence: On a deployed Ignite (3 node) cluster, I see one one node being taken out of the cluster because it encounters GC Pauses. Worse, when this node leaves the cluster, a Rebalance is initiated (and re-initiated when the node joins back). Note: Data that Ignite Cluster holds is fully transactional. We cannot put up with Data Loss. >From the logs :
[14:32:01,643][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager] Copied file [src=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000006.wal, dst=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000306.wal] [14:32:02,830][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager] Starting to copy WAL segment [absIdx=307, segIdx=7, origFile=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000007.wal, dstFile=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000307.wal] [14:32:17,999][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible too long JVM pause: 15044 milliseconds. It is clear that WAL writes (FSYNC in this case) always precede GC Pauses. Question: The only advantage of FSYNC Vs LOG_ONLY seems to be surviving OS Level Crashes. With a Journaled filesystem like Ext4FS, do I really need FSYNC? Can't I get around with LOG_ONLY ? If not, how do I minimise the perf bottlenecks using FSYNC ?