On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg <t...@iconara.net> wrote:
> I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster), and
> my application is talking to it over the binary protocol (I'm using JRuby
> and the cql-rb driver). I get this error quite frequently: "Too many in
> flight hints: 2411" (the exact number varies)
>
> Has anyone any idea of what's causing it? I'm pushing the cluster quite hard
> with writes (but no reads at all).

The code that produces this message (below) sets the bound based on
the number of available processors. It is a bound of   number of in
progress hints. An in progress hint (for some reason redundantly
referred to as "in flight") is a hint which has been submitted to the
executor which will ultimately write it to local disk. If you get
OverloadedException, this means that you were trying to write hints to
this executor so fast that you risked OOM, so Cassandra refused to
submit your hint to the hint executor and therefore (partially) failed
your write.

"
private static volatile int maxHintsInProgress = 1024 *
FBUtilities.getAvailableProcessors();
[... snip ...]
for (InetAddress destination : targets)
        {
            // avoid OOMing due to excess hints.  we need to do this
check even for "live" nodes, since we can
            // still generate hints for those if it's overloaded or
simply dead but not yet known-to-be-dead.
            // The idea is that if we have over maxHintsInProgress
hints in flight, this is probably due to
            // a small number of nodes causing problems, so we should
avoid shutting down writes completely to
            // healthy nodes.  Any node with no hintsInProgress is
considered healthy.
            if (totalHintsInProgress.get() > maxHintsInProgress
                && (hintsInProgress.get(destination).get() > 0 &&
shouldHint(destination)))
            {
                throw new OverloadedException("Too many in flight
hints: " + totalHintsInProgress.get());
            }
"

If Cassandra didn't return this exception, it might OOM while
enqueueing your hints to be stored. Giving up on trying to enqueue a
hint for the failed write is chosen instead. The solution is to reduce
your write rate, ideally by enough that you don't even queue hints in
the first place.

=Rob

Reply via email to