Hi Edi,

The description "most errors" sounds suspicious. If there's some outliers, it may indicate that you are looking at the wrong thing or there's multiple contributing factors to the result.

In Cassandra, small message is an inter-node commutations class for messages smaller than a configurable threshold (default to 64 KiB) and is neither the highest priority nor streaming. Many (most?) inter-node commutations fall into this class, this includes reads and writes, authentications, repairs and many more.

If the network for the inter-node connection is congested and unstable, there isn't much you can do about it other than improving the quality and bandwidth of the network and/or reducing congestion. This may mean vertical scaling (e.g. more bandwidth per node) or horizontal scaling (more nodes). But before you do any of that, you should first confirm that the cause of the issue is indeed network congestion, not something else.

Cheers,
Bowen


On 29/08/2024 10:47, edi mari wrote:

Thank you for your insights, Bowen.

The occurrence is inconsistent—some nodes report seven errors in ten minutes, while others show just one error every 24 hours at random times. We noticed that most errors tend to occur during periods of heavy network load.

I agree that addressing the root cause is essential, and we are actively working to reduce the network's pressure. Is there any tuning or configuration in Cassandra that could help prevent these errors? Where can I find more information about these errors, and under what circumstances do these messages appear? Additionally, what does the term "SMALL_MESSAGES" mean in the error message?


Edi

On Tue, Aug 27, 2024 at 8:04 PM Bowen Song via user <user@cassandra.apache.org> wrote:

    Hello Edi,

    Before attempt to prematurely optimise, let's try to understand the
    situation a bit better.

    * What's the bandwidth available? (think: total bandwidth and the
    typical usage)
    * What's causing the heavy network load?
    * How much bandwidth is consumed by the heavy network load?
    * How long do they typically last?
    * How frequent does that happen?
    * Is the thing causing the load flexible to run at a slower rate
    or at a
    different time of the day/week?

    It's usually better to address the problem at source, instead of
    tweaking the victims and hoping that they will better survive it.

    Cheers,
    Bowen


    On 27/08/2024 12:57, edi mari wrote:
    > Hello ,
    > Recently, we've noticed errors appearing in the Cassandra logs,
    which
    > coincide with periods of heavy network load. We investigated and
    > confirmed that the network was under significant stress during
    these
    > times.
    > Is there any configuration or tuning in Cassandra that could help
    > eliminate these errors?
    > Perhaps increasing the inbound connection timeout might help?
    >
    > Cassandra V4.0.4
    >
    > ERROR [Messaging-EventLoop-3-6] 2024-08-27 01:31:33,741
    > InboundMessageHandler.java:300 -
    > /xx.xx.xx.xx:7000->/xx.xx.xx.xx:7000-SMALL_MESSAGES-fbe3b1a9
    > unexpected exception caught while processing inbound messages;
    > terminating connection
    > ERROR [Messaging-EventLoop-3-4] 2024-08-27 11:10:16,390
    > InboundMessageHandler.java:300 -
    > /xx.xx.xx.xx:7000->/xx.xx.xx.xx:7000-SMALL_MESSAGES-5d216061
    > unexpected exception caught while processing inbound messages;
    > terminating connection
    >

Reply via email to