Hi Edi,
The description "most errors" sounds suspicious. If there's some
outliers, it may indicate that you are looking at the wrong thing or
there's multiple contributing factors to the result.
In Cassandra, small message is an inter-node commutations class for
messages smaller than a configurable threshold (default to 64 KiB) and
is neither the highest priority nor streaming. Many (most?) inter-node
commutations fall into this class, this includes reads and writes,
authentications, repairs and many more.
If the network for the inter-node connection is congested and unstable,
there isn't much you can do about it other than improving the quality
and bandwidth of the network and/or reducing congestion. This may mean
vertical scaling (e.g. more bandwidth per node) or horizontal scaling
(more nodes). But before you do any of that, you should first confirm
that the cause of the issue is indeed network congestion, not something
else.
Cheers,
Bowen
On 29/08/2024 10:47, edi mari wrote:
Thank you for your insights, Bowen.
The occurrence is inconsistent—some nodes report seven errors in ten
minutes, while others show just one error every 24 hours at random times.
We noticed that most errors tend to occur during periods of heavy
network load.
I agree that addressing the root cause is essential, and we are
actively working to reduce the network's pressure.
Is there any tuning or configuration in Cassandra that could help
prevent these errors?
Where can I find more information about these errors, and under what
circumstances do these messages appear?
Additionally, what does the term "SMALL_MESSAGES" mean in the error
message?
Edi
On Tue, Aug 27, 2024 at 8:04 PM Bowen Song via user
<user@cassandra.apache.org> wrote:
Hello Edi,
Before attempt to prematurely optimise, let's try to understand the
situation a bit better.
* What's the bandwidth available? (think: total bandwidth and the
typical usage)
* What's causing the heavy network load?
* How much bandwidth is consumed by the heavy network load?
* How long do they typically last?
* How frequent does that happen?
* Is the thing causing the load flexible to run at a slower rate
or at a
different time of the day/week?
It's usually better to address the problem at source, instead of
tweaking the victims and hoping that they will better survive it.
Cheers,
Bowen
On 27/08/2024 12:57, edi mari wrote:
> Hello ,
> Recently, we've noticed errors appearing in the Cassandra logs,
which
> coincide with periods of heavy network load. We investigated and
> confirmed that the network was under significant stress during
these
> times.
> Is there any configuration or tuning in Cassandra that could help
> eliminate these errors?
> Perhaps increasing the inbound connection timeout might help?
>
> Cassandra V4.0.4
>
> ERROR [Messaging-EventLoop-3-6] 2024-08-27 01:31:33,741
> InboundMessageHandler.java:300 -
> /xx.xx.xx.xx:7000->/xx.xx.xx.xx:7000-SMALL_MESSAGES-fbe3b1a9
> unexpected exception caught while processing inbound messages;
> terminating connection
> ERROR [Messaging-EventLoop-3-4] 2024-08-27 11:10:16,390
> InboundMessageHandler.java:300 -
> /xx.xx.xx.xx:7000->/xx.xx.xx.xx:7000-SMALL_MESSAGES-5d216061
> unexpected exception caught while processing inbound messages;
> terminating connection
>