Re: Bloom filter false positives high

Martin Mačura Wed, 17 Apr 2019 04:40:09 -0700

We cannot run any repairs on these tables.  Whenever we tried it
(incremental or full or partitioner range), it caused a node to run out of
disk space during anticompaction.  We'll try again once Cassandra 4.0 is
released.


On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> if you invoke nodetool it gets false positives number from this metric
>
>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578
>
> You get high false positives so this accumulates them
>
>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L572
>
> If you follow that, that number is computed here
>
>
> https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/sstable/BloomFilterTracker.java#L44-L55
>
> In order to have that number so high, the difference has to be so big
> so lastFalsePositiveCount is imho significantly lower
>
> False positives are ever increased only in BigTableReader where it get
> complicated very quickly and I am not sure why it is called to be
> honest.
>
> Is all fine with db as such? Do you run repairs? Does that number
> increses or decreases over time? Has repair or compaction some effect
> on it?
>
> On Wed, 17 Apr 2019 at 20:48, Martin Mačura <m.mac...@gmail.com> wrote:
> >
> > Both tables use the default bloom_filter_fp_chance of 0.01 ...
> >
> > CREATE TABLE ... (
> >    a int,
> >    b int,
> >    bucket timestamp,
> >    ts timeuuid,
> >    c int,
> > ...
> >    PRIMARY KEY ((a, b, bucket), ts, c)
> > ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC)
> >    AND bloom_filter_fp_chance = 0.01
> >    AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> > 'false'}
> >    AND dclocal_read_repair_chance = 0.0
> >    AND default_time_to_live = 63072000
> >    AND gc_grace_seconds = 10800
> > ...
> >    AND read_repair_chance = 0.0
> >    AND speculative_retry = 'NONE';
> >
> >
> > CREATE TABLE ... (
> >    c int,
> >    b int,
> >    bucket timestamp,
> >    ts timeuuid,
> > ...
> >    PRIMARY KEY ((c, b, bucket), ts)
> > ) WITH CLUSTERING ORDER BY (ts DESC)
> >    AND bloom_filter_fp_chance = 0.01
> >    AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS',
> 'tombstone_threshold': '0.9', 'unchecked_tombstone_compaction':
> > 'false'}
> >    AND dclocal_read_repair_chance = 0.0
> >    AND default_time_to_live = 63072000
> >    AND gc_grace_seconds = 10800
> > ...
> >    AND read_repair_chance = 0.0
> >    AND speculative_retry = 'NONE';
> >
> > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
> >>
> >> What is your bloom_filter_fp_chance for either table? I guess it is
> >> bigger for the first one, bigger that number is between 0 and 1, less
> >> memory it will use (17 MiB against 54.9 Mib) which means more false
> >> positives you will get.
> >>
> >> On Wed, 17 Apr 2019 at 19:59, Martin Mačura <m.mac...@gmail.com> wrote:
> >> >
> >> > Hi,
> >> > I have a table with poor bloom filter false ratio:
> >> >                SSTable count: 1223
> >> >                Space used (live): 726.58 GiB
> >> >                Number of partitions (estimate): 8592749
> >> >                Bloom filter false positives: 35796352
> >> >                Bloom filter false ratio: 0.68472
> >> >                Bloom filter space used: 17.82 MiB
> >> >                Compacted partition maximum bytes: 386857368
> >> >
> >> > It's a time series, TWCS compaction, window size 1 day, data
> partitioned in daily buckets, TTL 2 years.
> >> >
> >> > I have another table with a similar schema, but it is not affected
> for some reason:
> >> >                SSTable count: 1114
> >> >                Space used (live): 329.87 GiB
> >> >                Number of partitions (estimate): 25460768
> >> >                Bloom filter false positives: 156942
> >> >                Bloom filter false ratio: 0.00010
> >> >                Bloom filter space used: 54.9 MiB
> >> >                Compacted partition maximum bytes: 20924300
> >> >
> >> > Thanks for any advice,
> >> >
> >> > Martin
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: user-h...@cassandra.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Bloom filter false positives high

Reply via email to