On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg <t...@iconara.net> wrote: > I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster), and > my application is talking to it over the binary protocol (I'm using JRuby > and the cql-rb driver). I get this error quite frequently: "Too many in > flight hints: 2411" (the exact number varies) > > Has anyone any idea of what's causing it? I'm pushing the cluster quite hard > with writes (but no reads at all).
The code that produces this message (below) sets the bound based on the number of available processors. It is a bound of number of in progress hints. An in progress hint (for some reason redundantly referred to as "in flight") is a hint which has been submitted to the executor which will ultimately write it to local disk. If you get OverloadedException, this means that you were trying to write hints to this executor so fast that you risked OOM, so Cassandra refused to submit your hint to the hint executor and therefore (partially) failed your write. " private static volatile int maxHintsInProgress = 1024 * FBUtilities.getAvailableProcessors(); [... snip ...] for (InetAddress destination : targets) { // avoid OOMing due to excess hints. we need to do this check even for "live" nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead. // The idea is that if we have over maxHintsInProgress hints in flight, this is probably due to // a small number of nodes causing problems, so we should avoid shutting down writes completely to // healthy nodes. Any node with no hintsInProgress is considered healthy. if (totalHintsInProgress.get() > maxHintsInProgress && (hintsInProgress.get(destination).get() > 0 && shouldHint(destination))) { throw new OverloadedException("Too many in flight hints: " + totalHintsInProgress.get()); } " If Cassandra didn't return this exception, it might OOM while enqueueing your hints to be stored. Giving up on trying to enqueue a hint for the failed write is chosen instead. The solution is to reduce your write rate, ideally by enough that you don't even queue hints in the first place. =Rob