a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?)
I think you should ask yourself a different question. Why am I using ALLOW FILTERING in the first place? What happens if I remove it from the query? I prefer to denormalize the data to multiple tables or at least create an index on the requested column (preferably queried together with a known partition key). b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? I think it can justify the unresponsiveness. When using ALLOW FILTERING, you are doing something like a full table scan in a relational database. There is a lot of information on the internet regarding this subject such as https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/ Hope this helps. Regards, On Thu, May 23, 2019 at 7:33 AM Attila Wind <attilaw@swf.technology> wrote: > Hi, > > "When you run a query with allow filtering, Cassandra doesn't know where > the data is located, so it has to go node by node, searching for the > requested data." > > a) Interesting... But only in case you do not provide partitioning key > right? (so IN() is for partitioning key?) > > b) Still does not explain or justify "all 8 nodes to halt and > unresponsiveness to external requests" behavior... Even if servers are busy > with the request seriously becoming non-responsive...? > > cheers > Attila Wind > > http://www.linkedin.com/in/attilaw > Mobile: +36 31 7811355 > > > On 2019. 05. 23. 0:37, shalom sagges wrote: > > Hi Vsevolod, > > 1) Why such behavior? I thought any given SELECT request is handled by a > limited subset of C* nodes and not by all of them, as per connection > consistency/table replication settings, in case. > When you run a query with allow filtering, Cassandra doesn't know where > the data is located, so it has to go node by node, searching for the > requested data. > > 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? > I'm not familiar with such a flag. In my case, I just try to educate the > R&D teams. > > Regards, > > On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov <vsfilare...@gmail.com> > wrote: > >> Hello everyone, >> >> We have an 8 node C* cluster with large volume of unbalanced data. Usual >> per-partition selects work somewhat fine, and are processed by limited >> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING, >> such command stalls all 8 nodes to halt and unresponsiveness to external >> requests while disk IO jumps to 100% across whole cluster. In several >> minutes all nodes seem to finish ptocessing the request and cluster goes >> back to being responsive. Replication level across whole data is 3. >> >> 1) Why such behavior? I thought any given SELECT request is handled by a >> limited subset of C* nodes and not by all of them, as per connection >> consistency/table replication settings, in case. >> >> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? >> >> Thank you all very much in advance, >> Vsevolod Filaretov. >> >