Caleb Rackliffe created CASSANDRA-17424:
-------------------------------------------

             Summary: Performance and Semantic Concerns w/ Metrics for Local 
vs. Remote Requests in StorageProxy
                 Key: CASSANDRA-17424
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17424
             Project: Cassandra
          Issue Type: Bug
          Components: Observability/Metrics
            Reporter: Caleb Rackliffe


In CASSANDRA-10023, we added two new metrics to both {{ClientRequestMetrics}} 
and {{ClientWriteRequestMetrics}} to represent requests where the driver either 
does or does not make a correct token-aware choice of coordinator. (Auditing 
driver behavior is listed as the primary goal of that Jira.)

There are, however, a few concerns we should address before this releases in 
4.1:

1.) With paging enabled and a LIMIT < fetch size, {{IN}} queries can hit 
{{fetchRows()}} multiple times, so the number of local + remote requests isn’t 
the same as the number of queries marked in {{ClientRequestMetrics}} in 
{{readRegular()}}.

2.) {{IN}} queries will potentially mark a bunch of “remote” requests even if 
one key in the {{IN}} set is “local”.

3.) Something similar happens with mutations. If {{StorageProxy#mutate()}} 
receives multiple mutations, we’ll mark against one of these new metrics in 
{{ClientWriteRequestMetrics}} for each mutation, while 
{{ClientWriteRequestMetrics}} will only register the actual client request once.

For cases 2 and 3, we may mark both local and remote requests for the same 
overall client request, which introduces ambiguity if these are intended to 
help audit driver coordinator selection behavior. There are a few options:

a.) We can accept the ambiguity, but then we haven’t really accomplished the 
goal of CASSANDRA-10023 for some request types.

b.) We can simply not record any of these metrics for requests where multiple 
partitions/tokens are involved.

c.) We can be lenient, marking requests as “local” if any of the 
partitions/tokens involved in the client request are, in fact, local.

“c” feels like the option that preserves as much functionality as possible 
without being ambiguous, but problem #2 above is still tricky, given the way IN 
and GROUP BY queries behave w/ paging. (Perhaps ambiguity in that case is 
acceptable?)

In addition to the general ambiguity around the above…

4.) There is excessive object creation involved (on a hot path) in our 
determination of whether a request is local or remote. We should be able to 
mitigate this by getting rid of {{AbstractReadExecutor#getContactedReplicas()}} 
and relying on {{ReplicaPlan#lookup()}} rather than creating strings. (Even for 
writes, we should be able to push down marking into performWrite(), where the 
write ReplicaPlan is already available.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to