Martin Davidsson schrieb:
I've tried to read up on how to decide, when writing a query, what
criteria goes in the q parameter and what goes in the fq parameter, to
achieve optimal performance. Is there [...] some kind of rule of thumb
to help me decide how to split things up when querying against one or
more fields.

This is a good question. I don't know if there is any such rule. I'm
going to sum up my understanding of filter queries hoping that the pros
will point out any flaws in my assumptions.

http://wiki.apache.org/solr/SolrCaching - filterCache

A filter query is cached, which means that it is the more useful the
more often it is repeated. We know how often certain queries arise, or
at least have the means to collect that data - so we know what might be
candidates for filtering.

The result of a filter query is cached and then used to filter a primary
query result using set intersection. If my filter query result comprises
more than 50 % of the entire document collection, its selectivity is
poor. I might need it despite this fact, but it might also be worth
while thinking about how to reframe the requirement, allowing for more
efficient filters.

Memory consumption is probably not a great concern here as the cache
stores only document IDs. (And if those are integers, it's just 4 bytes
each.) So having 100 filters containing 100,000 items on average, the
memory consumption increase should be around 40 MB.

By the way, are these document IDs (user in filterCache, documentCache,
queryResultCache) the ones I configure in schema.xml or does Solr map my
IDs to integers in order to ensure efficiency?

A filter query should probably be orthogonal to the primary query, which
means in plain English: unrelated to the primary query. To give an
example, I have a field "category", which is a required field. In the
class of searches where I use a filter on that field, the primary search
is for something entirely different, so in most cases, it will not, or
not necessarily, bias the primary result to any particular distribution
of the category values. I then allow the application to apply filtering
by category, incidentally, using faceting, which is a typical usage
pattern, I guess.

Michael Ludwig

Reply via email to