Hello,

Please try to apply and consider generic optimization techniques for the
persistence:
https://apacheignite.readme.io/docs/durable-memory-tuning

In the meantime:

   - Try to keep investigating the cause of the GC pause unless you 100%
   sure it's caused by rebalancing
   - Increase IgniteConfiguration.failureDetectionTimeout to 20 seconds to
   prevent nodes shut down on long GC pauses
   - Have you tuned up your JVM settings?
   https://apacheignite.readme.io/docs/jvm-and-system-tuning

As for FSYNC vs LOG_ONLY, the former protects you from global cluster
outages when all the nodes go down at one time. If it's an unlikely event
situation that it's ok to relax the mode to LOG_ONLY as long as you have
backups copies on other nodes.

--
Denis

On Tue, Jan 1, 2019 at 8:23 PM Ignite Enthusiast <ignite_en...@yahoo.com>
wrote:

> Question on Ignite Persistence:
>
> On a deployed Ignite (3 node) cluster, I see one one node being taken out
> of the cluster because it encounters GC Pauses. Worse, when this node
> leaves the cluster, a Rebalance is initiated (and re-initiated when the
> node joins back).
>
> Note: Data that Ignite Cluster holds is fully transactional. We cannot put
> up with Data Loss.
>
> From the logs :
>
> [14:32:01,643][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager]
> Copied file
> [src=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000006.wal,
> dst=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000306.wal]
>
> [14:32:02,830][INFO][wal-file-archiver%null-#44][FsyncModeFileWriteAheadLogManager]
> Starting to copy WAL segment [absIdx=307, segIdx=7,
> origFile=/data2/data/wal/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000007.wal,
> dstFile=/data2/data/wal/archive/node00-8d707f27-d022-4237-85cf-28c36828a0a3/0000000000000307.wal]
>
> [14:32:17,999][WARNING][jvm-pause-detector-worker][IgniteKernal] Possible
> too long JVM pause: 15044 milliseconds.
>
> It is clear that WAL writes (FSYNC in this case) always precede GC Pauses.
>
> Question:
>
> The only advantage of FSYNC Vs LOG_ONLY seems to be surviving OS Level
> Crashes. With a Journaled filesystem like Ext4FS, do I really need FSYNC?
> Can't I get around with LOG_ONLY ?
>
> If not, how do I minimise the perf bottlenecks using FSYNC ?
>

Reply via email to