Re: Drastic increase of bloom filter sizer after upgrading from 2.2.14 to 3.11.4

2019-10-01 Thread Matthias Pfau
delete those bloom filter files and restart cassandra, they are re-created. You can also run a user defined compaction on that sstable to rewrite the bloom filter file. This is exactly how we upgraded: determine which CFs have bigger bloom filters (cfstats) run upgradesstables individually for those

Re: Drastic increase of bloom filter sizer after upgrading from 2.2.14 to 3.11.4

2019-09-10 Thread Matthias Pfau
NVALID: > Hi there, > we just finished upgrading sstables on a single node after upgrading from  > 2.2.14 to 3.11.4. Since then, we noted a drastic increase of off heap memory > consumption. This is due to increased bloom filter size. > > According to cfstats output "Bloo

Drastic increase of bloom filter sizer after upgrading from 2.2.14 to 3.11.4

2019-09-10 Thread Matthias Pfau
Hi there, we just finished upgrading sstables on a single node after upgrading from  2.2.14 to 3.11.4. Since then, we noted a drastic increase of off heap memory consumption. This is due to increased bloom filter size. According to cfstats output "Bloom filter off heap memory used" in

Re: Bloom filter false positives high

2019-05-16 Thread Martin Mačura
I've decreased bloom_filter_fp_chance from 0.01 to 0.001. The sstableupgrade took 3 days to complete. And this is a result: node1 Bloom filter false positives: 380965 Bloom filter false ratio: 0.46560 Bloom filter space used: 27.

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
an expert in this. > > If you think about this, the whole concept of Bloom filter is to check > if some record is in particular SSTable. False positive mean that, > obviously, filter thought it was there but in fact it is not. So > Cassandra did a look unnecessarily. Why does it thi

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
One thing comes to my mind but my reasoning is questionable as I am not an expert in this. If you think about this, the whole concept of Bloom filter is to check if some record is in particular SSTable. False positive mean that, obviously, filter thought it was there but in fact it is not. So

Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
race_seconds = 10800 > > ... > > AND read_repair_chance = 0.0 > >AND speculative_retry = 'NONE'; > > > > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic < > stefan.mikloso...@instaclustr.com> wrote: > >> > >> What is your bloom_

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
AND read_repair_chance = 0.0 >AND speculative_retry = 'NONE'; > > On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic > wrote: >> >> What is your bloom_filter_fp_chance for either table? I guess it is >> bigger for the first one, bigger that number is bet

Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
tive_retry = 'NONE'; On Wed, Apr 17, 2019 at 12:25 PM Stefan Miklosovic < stefan.mikloso...@instaclustr.com> wrote: > What is your bloom_filter_fp_chance for either table? I guess it is > bigger for the first one, bigger that number is between 0 and 1, less > memory it will

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
i, > I have a table with poor bloom filter false ratio: >SSTable count: 1223 >Space used (live): 726.58 GiB >Number of partitions (estimate): 8592749 > Bloom filter false positives: 35796352 >Bloom fil

Bloom filter false positives high

2019-04-17 Thread Martin Mačura
Hi, I have a table with poor bloom filter false ratio: SSTable count: 1223 Space used (live): 726.58 GiB Number of partitions (estimate): 8592749 Bloom filter false positives: 35796352 Bloom filter false ratio: 0.68472

Re: Bloom filter memory usage disparity

2016-05-17 Thread Jeff Jirsa
Even with the same data, bloom filter is based on sstables. If your compaction behaves differently on 2 nodes than the third, your bloom filter RAM usage may be different. From: Kai Wang Reply-To: "user@cassandra.apache.org" Date: Tuesday, May 17, 2016 at 8:02 PM

Re: Bloom filter memory usage disparity

2016-05-17 Thread Kai Wang
e > for the 3 nodes to understand. > > It might have been a temporary situation, but in this case you would know > by now. > > C*heers, > > > 2016-05-03 18:47 GMT+02:00 Kai Wang : > >> Hi, >> >> I have a table on 3-node cluster. I notice bloom filter me

Re: Bloom filter memory usage disparity

2016-05-17 Thread Alain RODRIGUEZ
per node? In any case, we need the data size for the 3 nodes to understand. It might have been a temporary situation, but in this case you would know by now. C*heers, 2016-05-03 18:47 GMT+02:00 Kai Wang : > Hi, > > I have a table on 3-node cluster. I notice bloom filter memory usag

Bloom filter memory usage disparity

2016-05-03 Thread Kai Wang
Hi, I have a table on 3-node cluster. I notice bloom filter memory usage are very different on one of the node. For a given table, I checked CassandraMetricsRegistry$JmxGauge.[table]_BloomFilterOffHeapMemoryUsed.Value. 2 of 3 nodes show 1.5GB while the other shows 2.5 GB. What could be the

Re: High Bloom filter false ratio

2016-02-23 Thread Jeff Jirsa
;user@cassandra.apache.org" Date: Tuesday, February 23, 2016 at 12:37 AM To: "user@cassandra.apache.org" Subject: Re: High Bloom filter false ratio Looks like that sstablemetadata is available in 2.2 , we are on 2.0.x do you know anything that will work on 2.0.x On Tue, Feb 23, 2016

RE: High Bloom filter false ratio

2016-02-23 Thread SEAN_R_DURITY
I see the sstablemetadata tool as far back as 1.2.19 (in tools/bin). Sean Durity From: Anishek Agarwal [mailto:anis...@gmail.com] Sent: Tuesday, February 23, 2016 3:37 AM To: user@cassandra.apache.org Subject: Re: High Bloom filter false ratio Looks like that sstablemetadata is available in 2.2

Re: High Bloom filter false ratio

2016-02-23 Thread Anishek Agarwal
list of sstables that you >> could feed to forceUserDefinedCompaction to join together to eliminate >> leftover waste. >> >> Your long ParNew times may be fixable by increasing the new gen size of >> your heap – the general guidance in cassandra-env.sh is out of date,

Re: High Bloom filter false ratio

2016-02-23 Thread Anishek Agarwal
by increasing the new gen size of > your heap – the general guidance in cassandra-env.sh is out of date, you > may want to reference CASSANDRA-8150 for “newer” advice ( > http://issues.apache.org/jira/browse/CASSANDRA-8150 ) > > - Jeff > > From: Anishek Agarwal > Reply-To: &

Re: High Bloom filter false ratio

2016-02-22 Thread Jeff Jirsa
-8150 ) - Jeff From: Anishek Agarwal Reply-To: "user@cassandra.apache.org" Date: Monday, February 22, 2016 at 8:33 PM To: "user@cassandra.apache.org" Subject: Re: High Bloom filter false ratio Hey Jeff, Thanks for the clarification, I did not explain

Re: High Bloom filter false ratio

2016-02-22 Thread Anishek Agarwal
t; Reply-To: "user@cassandra.apache.org" > Date: Sunday, February 21, 2016 at 11:13 PM > To: "user@cassandra.apache.org" > Subject: Re: High Bloom filter false ratio > > Hey guys, > > Just did some more digging ... looks like DTCS is not removing old data > completely,

Re: High Bloom filter false ratio

2016-02-22 Thread Jeff Jirsa
uot;user@cassandra.apache.org" Date: Sunday, February 21, 2016 at 11:13 PM To: "user@cassandra.apache.org" Subject: Re: High Bloom filter false ratio Hey guys, Just did some more digging ... looks like DTCS is not removing old data completely, I used sstable2json for one such table

Re: High Bloom filter false ratio

2016-02-22 Thread Christopher Bradford
;> chovatia.jayd...@gmail.com> wrote: >> >>> To me following three looks on higher side: >>> SSTable count: 1289 >>> >>> In order to reduce SSTable count see if you are compacting of not (If >>> using STCS). Is it possible to change t

Re: High Bloom filter false ratio

2016-02-21 Thread Anishek Agarwal
it possible to change this to LCS? >> >> >> Number of keys (estimate): 345137664 (345M partition keys) >> >> I don't have any suggestion about reducing this unless you partition your >> data. >> >> >> Bloom filter space used, bytes: 493777336 (400M

Re: High Bloom filter false ratio

2016-02-21 Thread Anishek Agarwal
have any suggestion about reducing this unless you partition your > data. > > > Bloom filter space used, bytes: 493777336 (400MB is huge) > > If number of keys are reduced then this will automatically reduce bloom > filter size I believe. > > > > Jaydeep > > On

Re: High Bloom filter false ratio

2016-02-19 Thread Jaydeep Chovatia
this unless you partition your data. Bloom filter space used, bytes: 493777336 (400MB is huge) If number of keys are reduced then this will automatically reduce bloom filter size I believe. Jaydeep On Thu, Feb 18, 2016 at 7:52 PM, Anishek Agarwal wrote: > Hey all, > > @Jaydeep here

Re: High Bloom filter false ratio

2016-02-19 Thread Chris Lohfink
> > Memtable data size, bytes: 106558314 > > Memtable switch count: 3266 > > Local read count: 1721134803 > > Local read latency: 0.048 ms > > Local write count: 56743898 > > Local write latency: 0.018 ms > > Pending tasks: 0 > > Bloom filter false positiv

Re: High Bloom filter false ratio

2016-02-18 Thread Anishek Agarwal
read latency: 0.048 ms Local write count: 56743898 Local write latency: 0.018 ms Pending tasks: 0 Bloom filter false positives: 40664437 Bloom filter false ratio: 0.69058 Bloom filter space used, bytes: 493777336 Bloom filter off heap memory used, bytes: 493767024 Index summary off heap

Re: High Bloom filter false ratio

2016-02-18 Thread Jaydeep Chovatia
How many partition keys exists for the table which shows this problem (or provide nodetool cfstats for that table)? On Thu, Feb 18, 2016 at 11:38 AM, daemeon reiydelle wrote: > The bloom filter buckets the values in a small number of buckets. I have > been surprised by how many cases I se

Re: High Bloom filter false ratio

2016-02-18 Thread daemeon reiydelle
The bloom filter buckets the values in a small number of buckets. I have been surprised by how many cases I see with large cardinality where a few values populate a given bloom leaf, resulting in high false positives, and a surprising impact on latencies! Are you seeing 2:1 ranges between mean

Re: High Bloom filter false ratio

2016-02-18 Thread Tyler Hobbs
You can try slightly lowering the bloom_filter_fp_chance on your table. Otherwise, it's possible that you're repeatedly querying one or two partitions that always trigger a bloom filter false positive. You could try manually tracing a few queries on this table (for non-existent part

High Bloom filter false ratio

2016-02-17 Thread Anishek Agarwal
Hello, We have a table with composite partition key with humungous cardinality, its a combination of (long,long). On the table we have bloom_filter_fp_chance=0.01. On doing "nodetool cfstats" on the 5 nodes we have in the cluster we are seeing "Bloom filter false ratio:"

Re: High Bloom Filter FP Ratio

2014-12-19 Thread Chris Hart
Hi Tyler, I tried what you said and false positives look much more reasonable there. Thanks for looking into this. -Chris - Original Message - From: "Tyler Hobbs" To: user@cassandra.apache.org Sent: Friday, December 19, 2014 1:25:29 PM Subject: Re: High Bloom Filter FP Rat

Re: High Bloom Filter FP Ratio

2014-12-19 Thread Tyler Hobbs
I took a look at the code where the bloom filter true/false positive counters are updated and notice that the true-positive count isn't being updated on key cache hits: https://issues.apache.org/jira/browse/CASSANDRA-8525. That may explain your ratios. Can you try querying for a few non-exi

Re: High Bloom Filter FP Ratio

2014-12-19 Thread Mark Greene
Memtable data size: 1299614 Memtable switch count: 2 Local read count: 2458290 Local read latency: 0.853 ms Local write count: 10044 Local write latency: 0.186 ms Pending flushes: 0 Bloom filter false positives: 11096 *Bloo

High Bloom Filter FP Ratio

2014-12-17 Thread Chris Hart
Memtable data size, bytes: 20903960 Memtable switch count: 148 Local read count: 1396402 Local read latency: 0.362 ms Local write count: 2345306 Local write latency: 0.062 ms Pending tasks: 0

Re: why bloom filter is only for row key?

2014-09-17 Thread Philo Yang
Thanks Rob Thanks, Philo Yang 2014-09-16 2:12 GMT+08:00 DuyHai Doan : > Nice catch Rob > > On Mon, Sep 15, 2014 at 8:04 PM, Robert Coli wrote: > >> On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang wrote: >> >>> After reading some docs, I find that bloom filter

Re: why bloom filter is only for row key?

2014-09-15 Thread DuyHai Doan
Nice catch Rob On Mon, Sep 15, 2014 at 8:04 PM, Robert Coli wrote: > On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang wrote: > >> After reading some docs, I find that bloom filter is built on row keys, >> not on column key. Can anyone tell me what is considered for not building

Re: why bloom filter is only for row key?

2014-09-15 Thread Robert Coli
On Sun, Sep 14, 2014 at 11:22 AM, Philo Yang wrote: > After reading some docs, I find that bloom filter is built on row keys, > not on column key. Can anyone tell me what is considered for not building > bloom filter on column key? Is it a good idea to offer a table property > option

Re: why bloom filter is only for row key?

2014-09-15 Thread Philo Yang
Thanks DuyHai, I think the trouble of bloom filter on all row keys & column names is memory usage. However, if a CF has only hundreds of columns per row, the number of total columns will be much fewer, so the bloom filter is possible for this condition, right? Is there a good way to adjust b

Re: why bloom filter is only for row key?

2014-09-14 Thread DuyHai Doan
Hello Philo Building bloom filter for column names (what you call column key) is technically possible but very expensive in term of memory usage. The approximate formula to calculate space required by bloom filter can be found on slide 27 here: http://fr.slideshare.net/quipo/modern-algorithms

why bloom filter is only for row key?

2014-09-14 Thread Philo Yang
Hi all, After reading some docs, I find that bloom filter is built on row keys, not on column key. Can anyone tell me what is considered for not building bloom filter on column key? Is it a good idea to offer a table property option between row key and primary key for what boolm filter is built

Impact of Bloom filter false positive rate

2014-05-30 Thread Thomas GERBET
Hi, I'm currently working on some properties of Bloom filters and this is the first time I use Cassandre, so I'm sorry if my question seems dumb. Basically, I try to see the impact of the false positive rate of Bloom filter on performance. My test case is: 1. I create a table with: cr

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Mark Reddy
gt;> >>>>> Bloom filters are built on creation / rebuild of SSTable. If you >>>>> removed the data, but the old SSTables weren't compacted or you didn't >>>>> rebuild them manually, bloom filters will stay the same size. >>>>> >

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
es weren't compacted or you didn't >>>> rebuild them manually, bloom filters will stay the same size. >>>> >>>> M. >>>> >>>> Kind regards, >>>> Michał Michalski, >>>> michal.michal...@boxever.com >>>&

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
7;t rebuild them >>> manually, bloom filters will stay the same size. >>> >>> M. >>> >>> Kind regards, >>> Michał Michalski, >>> michal.michal...@boxever.com >>> >>> >>> On 14 April 2014 14:44, William Oberman w

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
ill stay the same size. >> >> M. >> >> Kind regards, >> Michał Michalski, >> michal.michal...@boxever.com >> >> >> On 14 April 2014 14:44, William Oberman wrote: >> >>> I had a thread on this forum about clearing junk from a CF. In m

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
lski, > michal.michal...@boxever.com > > > On 14 April 2014 14:44, William Oberman wrote: > >> I had a thread on this forum about clearing junk from a CF. In my case, >> it's ~90% of ~1 billion rows. >> >> One side effect I had hoped for was a reduction i

bloom filter + suddenly smaller CF

2014-04-14 Thread William Oberman
I had a thread on this forum about clearing junk from a CF. In my case, it's ~90% of ~1 billion rows. One side effect I had hoped for was a reduction in the size of the bloom filter. But, according to nodetool cfstats, it's still fairly large (~1.5GB of RAM). Do bloom filters e

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread Michal Michalski
4, William Oberman wrote: > I had a thread on this forum about clearing junk from a CF. In my case, > it's ~90% of ~1 billion rows. > > One side effect I had hoped for was a reduction in the size of the bloom > filter. But, according to nodetool cfstats, it's still fai

Re: bloom filter + suddenly smaller CF

2014-04-14 Thread DuyHai Doan
OAN On Mon, Apr 14, 2014 at 3:44 PM, William Oberman wrote: > I had a thread on this forum about clearing junk from a CF. In my case, > it's ~90% of ~1 billion rows. > > One side effect I had hoped for was a reduction in the size of the bloom > filter. But, according to nod

Re: Fp chance for column level bloom filter

2013-07-18 Thread aaron morton
enori Sato wrote: > Hi, > > I thought memory consumption of column level bloom filter will become a big > concern when a row becomes very wide like more than tens of millions of > columns. > > But I read from source(1.0.7) that fp chance for column level bloom filter is

Fp chance for column level bloom filter

2013-07-17 Thread Takenori Sato
Hi, I thought memory consumption of column level bloom filter will become a big concern when a row becomes very wide like more than tens of millions of columns. But I read from source(1.0.7) that fp chance for column level bloom filter is hard-coded as 0.160, which is very high. So seems not

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-28 Thread Hiller, Dean
ser@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Thursday, March 28, 2013 3:18 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: bloom filter fp ratio of 0.98 with fp_chance o

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-28 Thread Alain RODRIGUEZ
Message ----- > From: "Andras Szerdahelyi" > To: user@cassandra.apache.org > Sent: Wednesday, March 27, 2013 1:19:06 AM > Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01 > > > Aaron, > > > > > What version are you using ?

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread aaron morton
em. I guess you have to test it on your system and > see how it performs. > > Attached is the related thread for your reference. > > -Wei > > - Original Message - > From: "Andras Szerdahelyi" > To: user@cassandra.apache.org > Sent: Wednesday, M

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread Wei Zhu
the related thread for your reference. -Wei - Original Message - From: "Andras Szerdahelyi" To: user@cassandra.apache.org Sent: Wednesday, March 27, 2013 1:19:06 AM Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01 Aaron, What version are you using

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread Andras Szerdahelyi
kle.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Tuesday 26 March 2013 21:46 To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>&

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-26 Thread aaron morton
; measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the > requests hitting the bloom filter create a false positive while the "target" > false ratio is 0.01? > ( Also key cache hit ratio is around 0.001 and sstables read is in the skies > ( non-exponential

bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-25 Thread Andras Szerdahelyi
Hello list, Could anyone shed some light on how an FP chance of 0.01 coexist with a measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the requests hitting the bloom filter create a false positive while the "target" false ratio is 0.01? ( Also key cache hit ratio

Re: Changing bloom filter false positive ratio

2012-09-14 Thread Peter Schuller
> I have a hunch that the SSTable selection based on the Min and Max keys in > ColumnFamilyStore.markReferenced() means that a higher false positive has > less of an impact. > > it's just a hunch, i've not tested it. For leveled compaction, yes. For non-leveled, I can't see how it would since each

Re: Changing bloom filter false positive ratio

2012-09-14 Thread aaron morton
I have a hunch that the SSTable selection based on the Min and Max keys in ColumnFamilyStore.markReferenced() means that a higher false positive has less of an impact. it's just a hunch, i've not tested it. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.the

Re: Changing bloom filter false positive ratio

2012-09-13 Thread Eric Czech
Thanks Peter. On Thu, Sep 13, 2012 at 12:52 PM, Peter Schuller wrote: >> changing it on some of them. Can I just change that value through the >> cli and restart or are there any concerns I should have before trying >> to tweak that parameter? > > You can change it, you don't have to restart. It

Changing bloom filter false positive ratio

2012-09-12 Thread Eric Czech
Hi everyone, I'm running into heap pressure issues and I seem to have traced the problem to very large bloom filters. The bloom_filter_fp_chance is set to the default value on all my column families but I'd like to try changing it on some of them. Can I just change that value through the cli and

Re: OOM opening bloom filter

2012-03-13 Thread Mick Semb Wever
> How much smaller did the BF get to ? After pending compactions completed today, i'm presuming fp_ratio is applied now to all sstables in the keyspace, it has gone from 20G+ down to 1G. This node is now running comfortably on Xmx4G (used heap ~1.5G). ~mck -- "A Microsoft Certified System

Re: OOM opening bloom filter

2012-03-13 Thread aaron morton
Thanks for the update. How much smaller did the BF get to ? A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/03/2012, at 8:24 AM, Mick Semb Wever wrote: > > It's my understanding then for this use case that bloom filters are of > l

Re: OOM opening bloom filter

2012-03-12 Thread Mick Semb Wever
> > > > It's my understanding then for this use case that bloom filters are of > > > > little importance and that i can Ok. To summarise our actions to get us out of this situation, in hope that it may help others one day, we did the following actions: 1) upgrade to 1.0.7 2) set fp_ratio=0.99

Re: OOM opening bloom filter

2012-03-12 Thread aaron morton
>>> It's my understanding then for this use case that bloom filters are of >>> little importance and that i can >> Yes. AFAIK there is only one position seek (that will use the bloom filter) at the start of a get_range_slice request. After that the iterators ste

Re: OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote: > Are you doing RF=1? That is correct. So are you calculations then :-) > > very small, <1k. Data from this cf is only read via hadoop jobs in batch > > reads of 16k rows at a time. > [snip] > > It's my understanding then for this use cas

Re: OOM opening bloom filter

2012-03-11 Thread Peter Schuller
> This particular cf has up to ~10 billion rows over 3 nodes. Each row is With default settings, 143 million keys roughly gives you 2^31 bits of bloom filter. Or put another way, you get about 1 GB of bloom filters per 570 million keys, if I'm not mistaken. If you have 10 billion ro

Re: OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
On Sun, 2012-03-11 at 15:06 -0700, Peter Schuller wrote: > If it is legitimate use of memory, you *may*, depending on your > workload, want to adjust target bloom filter false positive rates: > >https://issues.apache.org/jira/browse/CASSANDRA-3497 This particular cf has up to

Re: OOM opening bloom filter

2012-03-11 Thread Peter Schuller
> How did this this bloom filter get too big? Bloom filters grow with the amount of row keys you have. It is natural that they grow bigger over time. The question is whether there is something "wrong" with this node (for example, lots of sstables and disk space used due to compactio

OOM opening bloom filter

2012-03-11 Thread Mick Semb Wever
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) This happens with (our normal) -Xmx12g setting. How did this this bloom filter get too big? Is the best option

Re: reported bloom filter FP ratio

2011-12-26 Thread Peter Schuller
>> I don't understand how you reached that conclusion. > > On my nodes most memory is consumed by bloom filters. Also 1.0 creates The point is that just because that's the problem you have, doesn't mean the default is wrong, since it quite clearly depends on use-case. If your relative amounts of r

Re: reported bloom filter FP ratio

2011-12-26 Thread Radim Kolar
ers than 0.8 leading to higher memory consumption, i just checked few sstables for index to bloom filter ratio on same dataset. in 0.8 bloom filters are about 13% of index size and in 1.0, its about 16%. Key used in CF is fixed size 4byte integer. Cassandra does not measure memory used by index

Re: reported bloom filter FP ratio

2011-12-26 Thread Peter Schuller
> but reported ratio is  Bloom Filter False Ratio: 0.00495 which is higher > than my computed ratio 0.000145. If you were true than reported ratio should > be lower then mine computed from CF reads because there are more reads to > sstables then to CF. The ratio is the ratio of false

Re: reported bloom filter FP ratio

2011-12-26 Thread Radim Kolar
Dne 25.12.2011 20:58, Peter Schuller napsal(a): Read Count: 68844 [snip] why reported bloom filter FP ratio is not counted like this 10/68844.0 0.00014525594096798558 Because the read count is total amount of reads to the CF, while the bloom filter is per sstable. The number

Re: reported bloom filter FP ratio

2011-12-25 Thread Peter Schuller
>                Read Count: 68844 [snip] > why reported bloom filter FP ratio is not counted like this >>>> 10/68844.0 > 0.00014525594096798558 Because the read count is total amount of reads to the CF, while the bloom filter is per sstable. The number of individual read

reported bloom filter FP ratio

2011-12-25 Thread Radim Kolar
I have following CF Read Count: 68844 Read Latency: 9.942 ms. Write Count: 209712 Write Latency: 0.297 ms. Pending Tasks: 0 Bloom Filter False Postives: 10 Bloom Filter False Ratio

Re: how to reduce disk read? (and bloom filter performance)

2011-10-17 Thread Radim Kolar
Look in jconcole -> org.apache.cassandra.db -> ColumnFamilies bloom filter false ratio is on this server 0.0018 and 0,06% reads hits more than 1 sstable. From cassandra point of view, it looks good.

Re: how to reduce disk read? (and bloom filter performance)

2011-10-17 Thread Mohit Anchlia
On Sun, Oct 16, 2011 at 2:20 AM, Radim Kolar wrote: > Dne 10.10.2011 18:53, Mohit Anchlia napsal(a): >> >> Does it mean you are not updating a row or deleting them? > > yes. i have 350m rows and only about 100k of them are updated. >> >>  Can you look at JMX values of >> >> BloomFilter* ? > > i co

Re: how to reduce disk read? (and bloom filter performance)

2011-10-16 Thread Radim Kolar
Dne 10.10.2011 18:53, Mohit Anchlia napsal(a): Does it mean you are not updating a row or deleting them? yes. i have 350m rows and only about 100k of them are updated. Can you look at JMX values of BloomFilter* ? i could not find this in jconsole mbeans or in jmx over http in cassandra 1.0

Re: how to reduce disk read? (and bloom filter performance)

2011-10-10 Thread Mohit Anchlia
Does it mean you are not updating a row or deleting them? Can you look at JMX values of BloomFilter* ? I don't believe bloom filter false positive % value is configurable. Someone else might be able to throw more light on this. I believe if you want to keep disk seeks to 1 ssTable you will

Re: factors on the effectiveness of bloom filter?

2011-10-10 Thread Radim Kolar
Dne 10.10.2011 18:31, Yang napsal(a): I noticed that 2 of my CFs are showing very different bloom filter false ratios, one is close to 1.0; the other one is only 0.3 cassandra bloom filters are computed for 1% false positive ratio. is there any measure to increase the effectiveness of bloom

factors on the effectiveness of bloom filter?

2011-10-10 Thread Yang
I noticed that 2 of my CFs are showing very different bloom filter false ratios, one is close to 1.0; the other one is only 0.3 they have roughly the same sizes in SStables and counts, the difference is key construction, the one with 0.3 false ratio has a shorter key. assuming the key can not be

Re: how to reduce disk read? (and bloom filter performance)

2011-10-09 Thread Radim Kolar
323 2857 3 56 it means bloom filter failure ratio over 1%. Cassandra in unit tests expects bloom filter false positive less than 1.05%. HBase has configurable bloom filters. You can choose 1% or 0.5% - it can make difference for large cache. But result is that my poor

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Mohit Anchlia
You'll see output like: Offset SSTables 1 8021 2 783 Which means 783 read operations accessed 2 SSTables On Fri, Oct 7, 2011 at 2:03 PM, Radim Kolar wrote: > Dne 7.10.2011 15:55, Mohit Anchlia napsal(a): >> >> Check your disk utilization using iostat. Also

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Radim Kolar
Dne 7.10.2011 15:55, Mohit Anchlia napsal(a): Check your disk utilization using iostat. Also, check if compactions are causing reads to be slow. Check GC too. You can look at cfhistograms output or post it here. i dont know how to interpret cf historgrams. can you write it to wiki?

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Mohit Anchlia
op of my head I it's not exposed via nodetool. >> > >> You can get it via HTTP if you install mx4j or if you could try >> http://wiki.cyclopsgroup.org/jmxterm > > i have MX4J/Http but cant find that info in listing. > > i suspect that bloom filter performance

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Radim Kolar
Dne 7.10.2011 10:04, aaron morton napsal(a): Of the top of my head I it's not exposed via nodetool. You can get it via HTTP if you install mx4j or if you could try http://wiki.cyclopsgroup.org/jmxterm i have MX4J/Http but cant find that info in listing. i suspect that bloom f

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread aaron morton
09 PM, Radim Kolar wrote: > Dne 16.9.2011 8:20, Yang napsal(a): >> I looked at the JMX attributes >> CFS.BloomFilterFalseRatio, it's 1.0 , BloomFilterFalsePositives, it's >> 2810, > its possible to query this bloom filter false ratio from command line?

Re: how to reduce disk read? (and bloom filter performance)

2011-10-07 Thread Radim Kolar
Dne 16.9.2011 8:20, Yang napsal(a): I looked at the JMX attributes CFS.BloomFilterFalseRatio, it's 1.0 , BloomFilterFalsePositives, it's 2810, its possible to query this bloom filter false ratio from command line?

how to reduce disk read? (and bloom filter performance)

2011-09-15 Thread Yang
after I put my cassandra cluster on heavy load (1k/s write + 1k/s read ) for 1 day, I accumulated about 30GB of data in sstables. I think the caches have warmed up to their stable state. when I started this, I manually cat all the sstables to /dev/null , so that they are loaded into memory (the

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Aditya Narayan
eQuery operation affected by or "depends >>>> on the length of the row" ?? (For my use case, I would use the column names >>>> list for this SliceQuery operation). >>>> >>>> >>>> Thanks >>>> Aditya >>>> >

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Sylvain Lebresne
, 2011 at 12:37 AM, E S wrote: >>>> > I've gotten myself really confused by >>>> > http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping >>>> someone can >>>> > help me understand what the io behavior of this operation

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Aditya Narayan
, Jonathan Ellis wrote: >> >>> On Sun, Feb 13, 2011 at 12:37 AM, E S wrote: >>> > I've gotten myself really confused by >>> > http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping >>> someone can >>> > help me understand what the i

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-14 Thread Sylvain Lebresne
eration would be. >> > >> > When I do a get_slice for a column range, will it seek to every SSTable? >> I had >> > thought that it would use the bloom filter on the row key so that it >> would only >> > do a seek to SSTables that have a very high prob

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread aaron morton
be. > > > > When I do a get_slice for a column range, will it seek to every SSTable? I > > had > > thought that it would use the bloom filter on the row key so that it would > > only > > do a seek to SSTables that have a very high probability of containing >

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread Aditya Narayan
ernals and am hoping > someone can > > help me understand what the io behavior of this operation would be. > > > > When I do a get_slice for a column range, will it seek to every SSTable? > I had > > thought that it would use the bloom filter on the row key so that it

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread Jonathan Ellis
a column range, will it seek to every SSTable?  I > had > thought that it would use the bloom filter on the row key so that it would > only > do a seek to SSTables that have a very high probability of containing columns > for that row. Yes. > In the linked doc above, it seem

  1   2   >