See https://issues.apache.org/jira/browse/CASSANDRA-9753
On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > I've been bugging you a few times, but now I've got trace data for a query > with LOCAL_QUORUM that is being sent to a remove data center. > > The setup is as follows: > NetworkTopologyStrategy: {"DC1":"1","DC2":"2"} > Both DC1 and DC2 have 2 nodes. > In DC2, one node is currently being rebuilt, and therefore does not > contain all data (yet). > > The client app connects to a node in DC1, and sends a SELECT query with CL > LOCAL_QUORUM, which in this case means ((1/2)+1=1. > If all is ok, the query always produces a result, because the requested > rows are guaranteed to be available in DC1. > > However, the query sometimes produces no result. I've been able to record > the traces of these queries, and it turns out that the coordinator node in > DC1 sometimes sends the query to DC2, to the node that is being rebuilt, > and does not have the requested rows. I've included an example trace below. > > The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node > is in DC2. > I've verified that the CL=LOCAL_QUORUM by printing it when the query is > sent (I'm using the datastax java driver). > > activity > | source | source_elapsed | thread > > ---------------------------------------------------------------------------+--------------+----------------+----------------------------------------- > Message received from /10.55.156.67 > | 10.88.4.194 | 48 | MessagingService-Incoming-/10.55.156.67 > Executing single-partition query on aggregate > | 10.88.4.194 | 286 | SharedPool-Worker-2 > Acquiring sstable references > | 10.88.4.194 | 306 | SharedPool-Worker-2 > Merging memtable tombstones > | 10.88.4.194 | 321 | SharedPool-Worker-2 > Partition index lookup allows skipping sstable 107 > | 10.88.4.194 | 458 | SharedPool-Worker-2 > Bloom filter allows skipping sstable 1 > | 10.88.4.194 | 489 | SharedPool-Worker-2 > Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones > | 10.88.4.194 | 496 | SharedPool-Worker-2 > Merging data from memtables and 0 sstables > | 10.88.4.194 | 500 | SharedPool-Worker-2 > Read 0 live and 0 tombstone cells > | 10.88.4.194 | 513 | SharedPool-Worker-2 > Enqueuing response to /10.55.156.67 > | 10.88.4.194 | 613 | SharedPool-Worker-2 > Sending message to /10.55.156.67 > | 10.88.4.194 | 672 | MessagingService-Outgoing-/10.55.156.67 > Parsing SELECT * FROM Aggregate WHERE type=? AND typeId=?; > | 10.55.156.67 | 10 | SharedPool-Worker-4 > Sending message to /10.88.4.194 > | 10.55.156.67 | 4335 | MessagingService-Outgoing-/10.88.4.194 > Message received from /10.88.4.194 > | 10.55.156.67 | 6328 | MessagingService-Incoming-/10.88.4.194 > Seeking to partition beginning in data file > | 10.55.156.67 | 10417 | SharedPool-Worker-3 > Key cache hit for sstable 389 > | 10.55.156.67 | 10586 | SharedPool-Worker-3 > > My question is: how is it possible that the query is sent to a node in > DC2? > Since DC1 has 2 nodes and RF 1, the query should always be sent to the > other node in DC1 if the coordinator does not have a replica, right? > > Thanks, > Tom > > > > > -- Tyler Hobbs DataStax <http://datastax.com/>