a) Interesting... But only in case you do not provide partitioning key
right? (so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I using ALLOW
FILTERING in the first place? What happens if I remove it from the query?
I prefer to denormalize the data to multiple tables or at least create an
index on the requested column (preferably queried together with a known
partition key).

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers are busy
with the request seriously becoming non-responsive...?

I think it can justify the unresponsiveness. When using ALLOW FILTERING,
you are doing something like a full table scan in a relational database.

There is a lot of information on the internet regarding this subject such
as
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

Hope this helps.

Regards,

On Thu, May 23, 2019 at 7:33 AM Attila Wind <attilaw@swf.technology> wrote:

> Hi,
>
> "When you run a query with allow filtering, Cassandra doesn't know where
> the data is located, so it has to go node by node, searching for the
> requested data."
>
> a) Interesting... But only in case you do not provide partitioning key
> right? (so IN() is for partitioning key?)
>
> b) Still does not explain or justify "all 8 nodes to halt and
> unresponsiveness to external requests" behavior... Even if servers are busy
> with the request seriously becoming non-responsive...?
>
> cheers
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +36 31 7811355
>
>
> On 2019. 05. 23. 0:37, shalom sagges wrote:
>
> Hi Vsevolod,
>
> 1) Why such behavior? I thought any given SELECT request is handled by a
> limited subset of C* nodes and not by all of them, as per connection
> consistency/table replication settings, in case.
> When you run a query with allow filtering, Cassandra doesn't know where
> the data is located, so it has to go node by node, searching for the
> requested data.
>
> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
> I'm not familiar with such a flag. In my case, I just try to educate the
> R&D teams.
>
> Regards,
>
> On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov <vsfilare...@gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> We have an 8 node C* cluster with large volume of unbalanced data. Usual
>> per-partition selects work somewhat fine, and are processed by limited
>> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
>> such command stalls all 8 nodes to halt and unresponsiveness to external
>> requests while disk IO jumps to 100% across whole cluster. In several
>> minutes all nodes seem to finish ptocessing the request and cluster goes
>> back to being responsive. Replication level across whole data is 3.
>>
>> 1) Why such behavior? I thought any given SELECT request is handled by a
>> limited subset of C* nodes and not by all of them, as per connection
>> consistency/table replication settings, in case.
>>
>> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>>
>> Thank you all very much in advance,
>> Vsevolod Filaretov.
>>
>

Reply via email to