Good afternoon, Am looking into some latent issues post 3.11.x to 4.0.x upgrade. The system is using materialized views and the core problem is related to how the mutations are being sent from the parent table to two related materialized views.
In 3.11.x, without any tuning ( no flag set for *-Dcassandra.mv_enable_coordinator_batchlog * or changes to the concurrent mv writers, etc. ) , the cluster behaved fine if a node went down. After the upgrade, there are tons of CL LOCAL_ONE issues as it relates to acquiring a lock on *every other node that was up*, and eventually *on every other node the *CPU, network, and memory get saturated until the node is brought back up. I've compared the cassandra.yaml, jvm.options etc and don't see anything especially different. I see two major code paths that use the ViewManager.*updatesAffectsView* to determine next steps are in Keyspace.java and in StorageProxy.java. The StorageProxy code is not being hit from my review. I've tracked the code down to: ( https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Keyspace.java#L528 ) And : https://github.com/apache/cassandra/blob/43ec1843918aba9e81d3c2dc1433a1ef4740a51f/src/java/org/apache/cassandra/db/view/ViewManager.java#L71 ``` if (!enableCoordinatorBatchlog && coordinatorBatchlog) return false; ``` We tried setting the mv coordinator flag to true and it made no difference, which shouldn't be the case. If a value isn't set it should be defaulting to false as per Java's Boolean.getBoolean method. *May try setting it to false and seeing the behavior. * The strategic recommendation I've made is to move away from MVs to self-managed views and then eventually make use of SAI if it works. I'm still curious why it would behave so drastically differently in 4.0.x than 3.11.x. Has anyone else seen something like this? Am also going to try to recreate this in a vanilla environment and will report back. rahul.xavier.si...@gmail.com http://cassandra.link