Re: Filter cache pollution during sharded edismax queries

Alan Woodward Tue, 30 Sep 2014 04:00:00 -0700

A bit of digging show that the extra entries in the filter cache are added when 
getting facets from a distributed search.  Once all the facets have been 
gathered, the co-ordinating node then asks the subnodes for an exact count for 
the final top-N facets, and the path for executing this goes though:
        SimpleFacets.getListedTermCounts()
-->     SolrIndexSearcher.numDocs()
-->     SolrIndexSearcher.getPositiveDocSet()
and this last method caches results in the filter cache.


Maybe these should be using a separate cache?
        
Alan Woodward
www.flax.co.uk


On 30 Sep 2014, at 11:38, Charlie Hull wrote:

> Hi,
> 
> We've just found a very similar issue at a client installation. They have
> around 27 million documents and are faceting on fields with high
> cardinality, and are unhappy with query performance and the server hardware
> necessary to make this performance acceptable. Last night we noticed the
> filter cache had a pretty low hit rate and seemed to be filling up with
> many unexpected items (we were testing with only a *single* actual filter
> query). Diagnosing this with the showItems flag set on the Solr admin
> statistics we could see entries relating to facets, even though we were
> sure we were using the default facet.method=fc setting that should prevent
> filters being constructed. We're thus seeing similar cache pollution to Ken
> and Anca.
> 
> We're trying a different type of cache (LFUCache) now and also may try
> tweaking cache sizes to try and help, as the filter creation seems to be
> something we can't easily get round.
> 
> cheers
> 
> Charlie
> Flax
> www.flax.co.uk
> 
> On 18 October 2013 14:32, Anca Kopetz <anca.kop...@kelkoo.com> wrote:
> 
>> Hi Ken,
>> 
>> Have you managed to find out why these entries were stored into
>> filterCache and if they have an impact on the hit ratio ?
>> We noticed the same problem, there are entries of this type :
>> item_+(+(title:western^10.0 | ... in our filterCache.
>> 
>> Thanks,
>> Anca
>> 
>> 
>> On 07/02/2013 09:01 PM, Ken Krugler wrote:
>> 
>> Hi all,
>> 
>> After upgrading from Solr 3.5 to 4.2.1, I noticed our filterCache hit
>> ratio had dropped significantly.
>> 
>> Previously it was at 95+%, but now it's < 50%.
>> 
>> I enabled recording 100 entries for debugging, and in looking at them it
>> seems that edismax (and faceting) is creating entries for me.
>> 
>> This is in a sharded setup, so it's a distributed search.
>> 
>> If I do a search for the string "bogus text" using edismax on two fields,
>> I get an entry in each of the shard's filter caches that looks like:
>> 
>> item_+(((field1:bogus | field2:bogu) (field1:text | field2:text))~2):
>> 
>> Is this expected?
>> 
>> I have a similar situation happening during faceted search, even though my
>> fields are single-value/untokenized strings, and I'm not using the enum
>> facet method.
>> 
>> But I'll get many, many entries in the filterCache for facet values, and
>> they all look like "item_<facet field>:<facet value>:"
>> 
>> The net result of the above is that even with a very big filterCache size
>> of 2K, the hit ratio is still only 60%.
>> 
>> Thanks for any insights,
>> 
>> -- Ken
>> 
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ________________________________
>> Kelkoo SAS
>> Société par Actions Simplifiée
>> Au capital de € 4.168.964,30
>> Siège social : 8, rue du Sentier 75002 Paris
>> 425 093 069 RCS Paris
>> 
>> Ce message et les pièces jointes sont confidentiels et établis à
>> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
>> destinataire de ce message, merci de le détruire et d'en avertir
>> l'expéditeur.
>>

Re: Filter cache pollution during sharded edismax queries

Reply via email to