A bit of digging show that the extra entries in the filter cache are added when getting facets from a distributed search. Once all the facets have been gathered, the co-ordinating node then asks the subnodes for an exact count for the final top-N facets, and the path for executing this goes though: SimpleFacets.getListedTermCounts() --> SolrIndexSearcher.numDocs() --> SolrIndexSearcher.getPositiveDocSet() and this last method caches results in the filter cache.
Maybe these should be using a separate cache? Alan Woodward www.flax.co.uk On 30 Sep 2014, at 11:38, Charlie Hull wrote: > Hi, > > We've just found a very similar issue at a client installation. They have > around 27 million documents and are faceting on fields with high > cardinality, and are unhappy with query performance and the server hardware > necessary to make this performance acceptable. Last night we noticed the > filter cache had a pretty low hit rate and seemed to be filling up with > many unexpected items (we were testing with only a *single* actual filter > query). Diagnosing this with the showItems flag set on the Solr admin > statistics we could see entries relating to facets, even though we were > sure we were using the default facet.method=fc setting that should prevent > filters being constructed. We're thus seeing similar cache pollution to Ken > and Anca. > > We're trying a different type of cache (LFUCache) now and also may try > tweaking cache sizes to try and help, as the filter creation seems to be > something we can't easily get round. > > cheers > > Charlie > Flax > www.flax.co.uk > > On 18 October 2013 14:32, Anca Kopetz <anca.kop...@kelkoo.com> wrote: > >> Hi Ken, >> >> Have you managed to find out why these entries were stored into >> filterCache and if they have an impact on the hit ratio ? >> We noticed the same problem, there are entries of this type : >> item_+(+(title:western^10.0 | ... in our filterCache. >> >> Thanks, >> Anca >> >> >> On 07/02/2013 09:01 PM, Ken Krugler wrote: >> >> Hi all, >> >> After upgrading from Solr 3.5 to 4.2.1, I noticed our filterCache hit >> ratio had dropped significantly. >> >> Previously it was at 95+%, but now it's < 50%. >> >> I enabled recording 100 entries for debugging, and in looking at them it >> seems that edismax (and faceting) is creating entries for me. >> >> This is in a sharded setup, so it's a distributed search. >> >> If I do a search for the string "bogus text" using edismax on two fields, >> I get an entry in each of the shard's filter caches that looks like: >> >> item_+(((field1:bogus | field2:bogu) (field1:text | field2:text))~2): >> >> Is this expected? >> >> I have a similar situation happening during faceted search, even though my >> fields are single-value/untokenized strings, and I'm not using the enum >> facet method. >> >> But I'll get many, many entries in the filterCache for facet values, and >> they all look like "item_<facet field>:<facet value>:" >> >> The net result of the above is that even with a very big filterCache size >> of 2K, the hit ratio is still only 60%. >> >> Thanks for any insights, >> >> -- Ken >> >> -------------------------- >> Ken Krugler >> +1 530-210-6378 >> http://www.scaleunlimited.com >> custom big data solutions & training >> Hadoop, Cascading, Cassandra & Solr >> >> >> >> >> >> >> >> >> ________________________________ >> Kelkoo SAS >> Société par Actions Simplifiée >> Au capital de € 4.168.964,30 >> Siège social : 8, rue du Sentier 75002 Paris >> 425 093 069 RCS Paris >> >> Ce message et les pièces jointes sont confidentiels et établis à >> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le >> destinataire de ce message, merci de le détruire et d'en avertir >> l'expéditeur. >>