On 10/3/2014 1:57 PM, Yonik Seeley wrote: > On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan <peterlkee...@gmail.com> wrote: >> Say I have a boolean field named 'hidden', and less than 1% of the >> documents in the index have hidden=true. >> Do both these filter queries use the same docset cache size? : >> fq=hidden:false >> fq=!hidden:true > > Nope... !hidden:true will be smaller in the cache (it will be cached > as hidden:true and then inverted) > The downside is that you'll pay the cost of that inversion.
I would think that unless it's using hashDocSet, the cached data for every filter would always be the same size. The wiki says that hashDocSet is no longer used for filter caching as of 1.4.0. Is that actually true? Is my understanding of filterCache completely out of touch with reality? https://wiki.apache.org/solr/SolrCaching#The_hashDocSet_Max_Size This does bring to mind an optimization that might help memory usage in cases where either a very small or very large percentage of documents match the filter: do run-length encoding on the bitset. If the RLE representation is at least N percent smaller than the bitset, use that representation instead. I think the first iteration of an RLE option would have it always on or always off, controlled in solrconfig.xml. A config mode where Solr attempts RLE on every bitset and periodically reports efficiency statistics would be pretty nice. That data might be useful to define default thresholds for a future automatic mode. Thanks, Shawn