I suspect a few possibilities: 1. I have not checked, but what happens (in terms of hint delivery) if a node tries to write something but the write times out even if the node is marked as up? 2. I would assume there can be ever so slight variations in how different nodes in the cluster think the rest of the cluster is up. These events will of course typically be short lived (unless some sort of long term split brain situation occurs), but if you are writing data while for instance a node is restarting, I would not be surprised if there are race conditions where A see B as down, sends a hint to C but C already think B is up 3. I have observed situations where it seems like a node comes in up state but for some reason takes a while to get really operational. Hint delivery fails, the hint sender gives up and nothing more happens.
May be an idea to let a node check if it has hints on heartbeats maybe (potentially not all of them, but at a regular interval)? Terje On Thu, Jun 16, 2011 at 2:08 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Wed, Jun 15, 2011 at 10:53 AM, Terje Marthinussen > <tmarthinus...@gmail.com> wrote: > > I was looking quickly at source code tonight. > > As far as I could see from a quick code scan, hint delivery is only > > triggered as a state change from a node is down to when it enters up > state? > > Right. > > > If this is indeed the case, it would potentially explain why we sometimes > > have hints on machines which does not seem to get played back > > Why is that? Hints don't get created in the first place unless a node > is in the down state. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >