On 10/3/2014 1:57 PM, Yonik Seeley wrote:
> On Fri, Oct 3, 2014 at 3:42 PM, Peter Keegan <peterlkee...@gmail.com> wrote:
>> Say I have a boolean field named 'hidden', and less than 1% of the
>> documents in the index have hidden=true.
>> Do both these filter queries use the same docset cache size? :
>> fq=hidden:false
>> fq=!hidden:true
> 
> Nope... !hidden:true will be smaller in the cache (it will be cached
> as hidden:true and then inverted)
> The downside is that you'll pay the cost of that inversion.

I would think that unless it's using hashDocSet, the cached data for
every filter would always be the same size.  The wiki says that
hashDocSet is no longer used for filter caching as of 1.4.0.  Is that
actually true?  Is my understanding of filterCache completely out of
touch with reality?

https://wiki.apache.org/solr/SolrCaching#The_hashDocSet_Max_Size

This does bring to mind an optimization that might help memory usage in
cases where either a very small or very large percentage of documents
match the filter: do run-length encoding on the bitset.  If the RLE
representation is at least N percent smaller than the bitset, use that
representation instead.

I think the first iteration of an RLE option would have it always on or
always off, controlled in solrconfig.xml.  A config mode where Solr
attempts RLE on every bitset and periodically reports efficiency
statistics would be pretty nice.  That data might be useful to define
default thresholds for a future automatic mode.

Thanks,
Shawn

Reply via email to