Thank you for highlighting this, it looks like I need to refresh my knowledge about IO schedulers :-)
Cheers, Dmitry On Thu, 22 Feb 2024 at 22:18, Bowen Song via user <user@cassandra.apache.org> wrote: > On the IO scheduler point, cfq WAS the only scheduler supporting IO > priorities (such as ionice) shipped by default with the Linux kernel, but > that has changed since bfq and mq-deadline were added to the Linux kernel. > Both bfq and mq-deadline supports IO priority, as documented here: > https://docs.kernel.org/block/ioprio.html > > > On 22/02/2024 19:39, Dmitry Konstantinov wrote: > > Hi all, > > I was not participating in the changes but I analyzed the question some > time ago from another side. > There were also changes related to -XX:ThreadPriorityPolicy JVM option. > When you set a thread priority for a Java thread it does not mean it is > always propagated as a native OS thread priority. To propagate the priority > you should use -XX:ThreadPriorityPolicy and -XX:JavaPriorityN_To_OSPriority > JVM options, but there is an issue with them because JVM wants to be > executed under root to set -XX:ThreadPriorityPolicy=1 which enables the > priorities usage. A hack was invented long time ago to workaround it by > setting -XX:ThreadPriorityPolicy=42 value (any value not equal to 0 or 1) > and bypass the not so needed and annoying grants validation logic (see > http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html > for more details about). > It worked for Java 8 but then there was a change in Java 9 about adding > extra validation for JVM option values (JEP 245: Validate JVM Command-Line > Flag Arguments - https://bugs.openjdk.org/browse/JDK-8059557) and the > hack stopped to work and started to cause JVM failure with a validation > error. As a reaction to it - the flag was removed from Cassandra JVM > configuration files in > https://issues.apache.org/jira/browse/CASSANDRA-13107. After it the lower > priority value for compaction threads have not had any actual effect. > The interesting story is that the JVM logic has been changed to support > the ability to set -XX:ThreadPriorityPolicy=1 for non-root users in Java 13 > (https://bugs.openjdk.org/browse/JDK-8215962) and the change was > backported to Java 11 as well (https://bugs.openjdk.org/browse/JDK-8217494 > ). > So, from this point of view I think it would be nice to return back the > ability to set thread priority for compaction threads. At the same time I > would not expect too much improvement by enabling it. > > P.S. There was also an idea about using ionice ( > https://issues.apache.org/jira/browse/CASSANDRA-9946) but the current > Linux IO schedulers do not take that into account anymore. It looks like > the only scheduler that supported ionice was CFQ ( > https://issues.apache.org/jira/browse/CASSANDRA-9946?focusedCommentId=14648616&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14648616) > and it was deprecated and removed since Linux kernel 5.x ( > https://github.com/torvalds/linux/commit/f382fb0bcef4c37dc049e9f6963e3baf204d815c > ). > > Regards, > Dmitry > > > On Thu, 22 Feb 2024 at 15:30, Bowen Song via user < > user@cassandra.apache.org> wrote: > >> Hi Pierre, >> >> Is there anything stopping you from using the compaction_throughput >> <https://github.com/apache/cassandra/blob/f9e033f519c14596da4dc954875756a69aea4e78/conf/cassandra.yaml#L989> >> option in the cassandra.yaml file to manage the performance impact of >> compaction operations? >> >> With thread priority, there's a failure scenario on busy nodes when the >> read operations uses too much CPU. If the compaction thread has lower >> priority, it does not get enough CPU time to run, and SSTable files will >> build up, causing read to become slower and more expensive, which in turn >> result in compaction gets even less CPU time. At the end, one of the >> following three will happen: >> >> - the node becomes too slow and most queries time out >> - the Java process crashes due to too many open files or OOM because >> JVM GC can't keep up >> - the filesystem run out of free space or inodes >> >> However, I'm unsure whether the compaction thread priority was >> intentionally removed from 4.1.0. Someone familiar with this matter may be >> able to answer that. >> >> Cheers, >> Bowen >> >> >> On 22/02/2024 13:26, Pierre Fersing wrote: >> >> Hello all, >> >> I've recently upgraded to Cassandra 4.1 and see a change in compaction >> behavior that seems unwanted: >> >> * With Cassandra 3.11 compaction was run by thread in low priority and >> thus using CPU nice (visible using top) (I believe Cassandra 4.0 also had >> this behavior) >> >> * With Cassandra 4.1, compactions are no longer run as low priority >> thread (compaction now use "normal" CPU). >> >> This means that when the server had limited CPU, Cassandra compaction now >> compete for the CPU with other process (probably including Cassandra >> itself) that need CPU. When it was using CPU nice, the compaction only >> competed for CPU with other lower priority process which was great as it >> leaves CPU available for processes that need to kept small response time >> (like an API used by human). >> >> Is it wanted to lose this feature in Cassandra 4.1 or was it just a >> forget during re-write of compaction executor ? Should I open a bug to >> re-introduce this feature in Cassandra ? >> >> >> I've done few searches, and: >> >> * I believe compaction used CPU nice because the compactor executor was >> created with minimal priority: >> https://github.com/apache/cassandra/blob/cassandra-3.11.16/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1906 >> >> * I think it was dropped by commit >> https://github.com/apache/cassandra/commit/be1f050bc8c0cd695a42952e3fc84625ad48d83a >> >> * It looks doable to set the thread priority with new executor, I think >> adding ".withThreadPriority(Thread.MIN_PRIORITY)" when using >> executorFactory in >> https://github.com/apache/cassandra/blob/77a3e0e818df3cce45a974ecc977ad61bdcace47/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L2028 >> should do it. >> >> >> Did I miss a reason to no longer use low priority threads for compaction >> ? Should I open a bug for re-adding this feature / submit a PR ? >> >> Regards, >> >> Pierre Fersing >> >> > > -- > Dmitry Konstantinov > > -- Dmitry Konstantinov