Re: Bloom filter false positives high

Stefan Miklosovic Wed, 17 Apr 2019 04:07:36 -0700

if you invoke nodetool it gets false positives number from this metric

https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578


You get high false positives so this accumulates them

https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572

If you follow that, that number is computed here

https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55

In order to have that number so high, the difference has to be so big
so lastFalsePositiveCount is imho significantly lower

False positives are ever increased only in BigTableReader where it get
complicated very quickly and I am not sure why it is called to be
honest.

Is all fine with db as such? Do you run repairs? Does that number
increses or decreases over time? Has repair or compaction some effect
on it?

On Wed, 17 Apr 2019 at 20:48, Martin Mačura <m.mac...@gmail.com> wrote:
>
> Both tables use the default bloom_filter_fp_chance of 0.01 ...
>
> CREATE TABLE ... (
>    a int,
>    b int,
>    bucket timestamp,
>    ts timeuuid,
>    c int,
> ...
>    PRIMARY KEY ((a, b, bucket), ts, c)
> ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)
>    AND bloom_filter_fp_chance = 0.01
>    AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> 'false'}
>    AND dclocal_read_repair_chance = 0.0
>    AND default_time_to_live = 63072000
>    AND gc_grace_seconds = 10800
> ...
>    AND read_repair_chance = 0.0
>    AND speculative_retry = 'NONE';
>
>
> CREATE TABLE ... (
>    c int,
>    b int,
>    bucket timestamp,
>    ts timeuuid,
> ...
>    PRIMARY KEY ((c, b, bucket), ts)
> ) WITH CLUSTERING ORDER BY (ts DESC)
>    AND bloom_filter_fp_chance = 0.01
>    AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS', 
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> 'false'}
>    AND dclocal_read_repair_chance = 0.0
>    AND default_time_to_live = 63072000
>    AND gc_grace_seconds = 10800
> ...
>    AND read_repair_chance = 0.0
>    AND speculative_retry = 'NONE';
>
> On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic 
> <stefan.mikloso...@instaclustr.com> wrote:
>>
>> What is your bloom_filter_fp_chance for either table? I guess it is
>> bigger for the first one, bigger that number is between 0 and 1, less
>> memory it will use (17 MiB against 54.9 Mib) which means more false
>> positives you will get.
>>
>> On Wed, 17 Apr 2019 at 19:59, Martin Mačura <m.mac...@gmail.com> wrote:
>> >
>> > Hi,
>> > I have a table with poor bloom filter false ratio:
>> >                SSTable count: 1223
>> >                Space used (live): 726.58 GiB
>> >                Number of partitions (estimate): 8592749
>> >                Bloom filter false positives: 35796352
>> >                Bloom filter false ratio: 0.68472
>> >                Bloom filter space used: 17.82 MiB
>> >                Compacted partition maximum bytes: 386857368
>> >
>> > It's a time series, TWCS compaction, window size 1 day, data partitioned 
>> > in daily buckets, TTL 2 years.
>> >
>> > I have another table with a similar schema, but it is not affected for 
>> > some reason:
>> >                SSTable count: 1114
>> >                Space used (live): 329.87 GiB
>> >                Number of partitions (estimate): 25460768
>> >                Bloom filter false positives: 156942
>> >                Bloom filter false ratio: 0.00010
>> >                Bloom filter space used: 54.9 MiB
>> >                Compacted partition maximum bytes: 20924300
>> >
>> > Thanks for any advice,
>> >
>> > Martin
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Bloom filter false positives high

Reply via email to