
If you are seeing any exceptions, please provide logs.

Yes, if you remove the node from baseline and have 1 backup, then the data
will be rebalanced between remaining nodes.

1K messages per seconds means 4M writes/sec just for checkpoints given page
size 4k, then add WAL to the mix.

Ilya Kasnacheev

ср, 7 апр. 2021 г. в 22:26, facundo.maldonado

> Hi everyone, kind of frustrated/disappointed here.
> I have a small cluster on a test environment where I'm trying to take some
> measures so
> I can size the cluster I will need in production and estimate some costs.
> The use case is simple, consume from a Kafka topic and populate the
> database
> so other components can start querying (key-value access only).
> The cluster is described below:
> AWS/K8S environment
> 4 data nodes and 4 'streamer' nodes.
> Data nodes:
> - 12 Gb memory requested
> - 4 Gb for JMV xms and xmx
> - 5 Gb DataRegion maxSize
> - persistence Enabled
> - writeThrottling Enabled
> - walSegmentSize 256 Mb
> - 10 Gb volume attached for storage /opt/work/storage
> - 3 Gb volume attached for WAL /opt/work/wal  (~10*walSegmentSize)
> - WalArchive disabled (walArchivePath==walArchive)
> - 1 cache
> - partitionLossPolicy READ_ONLY_SAFE
> - cacheMode PARTITIONED
> - writeSynchronizationMode PRIMARY_SYNC
> - rebalanceMode ASYNC
> - backups 1
> - expiryPolicyFactory AccessedExpiryPolicy 20 min
> Streamer nodes (Kafka streamer as grid service - node singleton)
> - 2 Gb memory requested
> - allowOverwrite false
> - autoflushFrequency 200ms
> - 16 consumers (64 partitions in topic)
> Streamer is configured to have a stream receiver, a StreamTransformer that
> checks an special case where I have to chose which record I will keep.
> Records are of 1.5 Kb (avg)
> They are deserialized and converted into domain objects that are streamed
> as
> BinaryObjects to the cache.
> Use case:
> Started with a clean environment. No data in cache, no data in wal/storage
> volumes, no data in the topic.
> Input data is generated at a constante rate of 1K mesages per second.
> First 20 minutes, cache size grow linearly. After that stays almost flat.
> Thats expected since ExpiryPolicy was set to 20 min.
> Around the hour, the lag in the consumers started to grow.
> After that, everything goes wrong.
> WAL size grew beyond the limits, exactly doubled before Kubernetes kills
> the
> pod.
> Around the same moment, memory usage started to grow to near the limit
> (12Gb)
> Throttling times and checkpointing duration were almost the same during the
> test. This last one is really high, (2 min avg), but I don't know if that
> is
> espected or not since I don't have nothing to compare.
> After 2 nodes were killed, they never join the cluster again.
> I increase the size of the wal volume size still they didn't join.
> Control.sh utility list both nodes as offline.
> Logs output a message like this:
> Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour [workerName=sys-stripe-6,
> threadName=sys-stripe-6-#7, blockedFor=74s]
> After restarting again them, one joined the cluster but not the other.
> Control.sh utility displayed the node as offline.
> By mistake I deleted the content of the wal folder. Shame on me.
> Now, the node don't even start.
> Node log displays:
> JVM will be halted immediately due to the failure:
> [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.StorageException: Failed to read
> checkpoint record from WAL, persistence consistency cannot be guaranteed.
> Make sure configuration points to correct WAL folders and WAL folder is
> properly mounted [ptr=WALPointer [idx=179, fileOff=236972130, len=15006],
> walPath=/opt/work/wal, walArchive=/opt/work/wal]]]
> What I think is expected.
> Now the node is completely unusable.
> Finally my questions are:
> - How can I reuse that node? Can I reuse it? Is there a way to clean the
> data and rejoin the node?
> - Do I lost the data of that node? It should be recovered from backups once
> I remove the node from baseline, is that correct?
> - If I increase the input rate to 2K the lag generated at the consumers
> becomes unmanaged. Adding more consumers will not help since they are
> already matched with topic partitions.
> - 1 K messages per second is really really really slow.
> - How exactly WAL works? Why I'm constantly running out of space here.
> - Any clue of what I'm doing wrong?
> Hope someone could throw some light here.
> Thanks
