>>> Jehan-Guillaume de Rorthais <j...@dalibo.com> schrieb am 11.10.2021 um 11:57 in Nachricht <20211011115737.7cc99e69@firost>: > Hi, > > I kept the full answer in history to keep the list informed of your full > answer. > > My answer down below. > > On Mon, 11 Oct 2021 11:33:12 +0200 > damiano giuliani <damianogiulian...@gmail.com> wrote: > >> ehy guys sorry for being late, was busy during the WE >> >> here i im: >> >> >> > Did you see the swap activity (in/out, not just swap occupation) happen in >> > the >> > >> > same time the member was lost on corosync side? >> > Did you check corosync or some of its libs were indeed in swap? >> > >> > >> no and i dont know how do it, i just noticed the swap occupation which >> suggest me (and my collegue) to find out if it could cause some trouble. >> >> > First, corosync now sit on a lot of memory because of knet. Did you try to >> > switch back to udpu which is using way less memory? >> >> >> No i havent move to udpd, cast stop processes at all. >> >> "Could not lock memory of service to avoid page faults" >> >> >> grep ‑rn 'Could not lock memory of service to avoid page faults' /var/log/* >> returns noting
Maybe the expression is too specific (try "lock memory", maybe), or syslog in in journal only (journalctl -b | grep "lock memory"). > > This message should appears on corosync startup. Make sure the logs hadn't > been > rotated to a blackhole in the meantime... > >> > On my side, mlocks is unlimited on ulimit settings. Check the values >> > in /proc/$(coro PID)/limits (be careful with the ulimit command, check the >> > proc itself). >> >> >> cat /proc/101350/limits >> Limit Soft Limit Hard Limit Units >> Max cpu time unlimited unlimited seconds >> Max file size unlimited unlimited bytes >> Max data size unlimited unlimited bytes >> Max stack size 8388608 unlimited bytes >> Max core file size 0 unlimited bytes >> Max resident set unlimited unlimited bytes >> Max processes 770868 770868 >> processes >> Max open files 1024 4096 files >> Max locked memory unlimited unlimited bytes >> Max address space unlimited unlimited bytes >> Max file locks unlimited unlimited locks >> Max pending signals 770868 770868 signals >> Max msgqueue size 819200 819200 bytes >> Max nice priority 0 0 >> Max realtime priority 0 0 >> Max realtime timeout unlimited unlimited us >> >> Ah... That's the first thing I change. >> > In SLES, that is defaulted to 10s and so far I have never seen an >> > environment that is stable enough for the default 1s timeout. >> >> >> old versions have 10s default >> you are not going to fix the problem lthis way, 1s timeout for a bonded >> network and overkill hardware is enourmous time. >> >> hostnamectl | grep Kernel >> Kernel: Linux 3.10.0‑1160.6.1.el7.x86_64 >> [root@ltaoperdbs03 ~]# cat /etc/os‑release >> NAME="CentOS Linux" >> VERSION="7 (Core)" >> >> > Indeed. But it's an arbitrage between swapping process mem or freeing >> > mem by removing data from cache. For database servers, it is advised to >> > use a >> > lower value for swappiness anyway, around 5‑10, as a swapped process means >> > longer query, longer data in caches, piling sessions, etc. >> >> >> totally agree, for db server swappines has to be 5‑10. >> >> kernel? >> > What are your settings for vm.dirty_* ? >> >> >> >> hostnamectl | grep Kernel >> Kernel: Linux 3.10.0‑1160.6.1.el7.x86_64 >> [root@ltaoperdbs03 ~]# cat /etc/os‑release >> NAME="CentOS Linux" >> VERSION="7 (Core)" >> >> >> sysctl ‑a | grep dirty >> vm.dirty_background_bytes = 0 >> vm.dirty_background_ratio = 10 > > Considering your 256GB of physical memory, this means you can dirty up to > 25GB > pages in cache before the kernel start to write them on storage. > > You might want to trigger these background, lighter syncs much before > hitting > this limit. > >> vm.dirty_bytes = 0 >> vm.dirty_expire_centisecs = 3000 >> vm.dirty_ratio = 20 > > This is 20% of your 256GB physical memory. After this limit, writes have to > go > to disks, directly. Considering the time to write to SSD compared to memory > and the amount of data to sync in the background as well (52GB), this could > be > very painful. Wowever (unless doing really large commits) databases should flush buffers rather frequently, so I doubt database operations would fill the dirty buffer rate. "watch cat /proc/meminfo" could be your friend. > >> vm.dirty_writeback_centisecs = 500 >> >> >> > Do you have a proof that swap was the problem? >> >> >> not at all but after switch to swappiness to 10, cluster doesnt sunndletly >> swap anymore from a month > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/