Hi,  I'm looking for some advice on how to add "base query" caching to SOLR.

Our use-case for SOLR is:

- a large Lucene index (32M docs, doubling in 6 months, 110GB increasing x 8
in 6 months)
- a frontend which presents views of this data in 5 "categories" by firing
off 5 queries with the same search term but 5 different "fq" values

For example, an originating query for "sydney harbour" generates 5 SOLR
queries:

- ../search?q=<complicated expansion of sydney harbour>&fq=category:books
- ../search?q=<complicated expansion of sydney harbour>&fq=category:maps
- ../search?q=<complicated expansion of sydney harbour>&fq=category:music
etc

The complicated expansion requiring sloppy phrase matches, and the large
database with lots of very large documents means that some queries take
quite some time (10's to several 100's of ms), so we'd like to cache the
results of the base query for a short time (long enough for all related
queries to be issued).

It looks like this isnt the use-case for queryResultCache, because its key
is calculated in SolrIndexSearcher like this:

key = new QueryResultKey(cmd.getQuery(), cmd.getFilterList(), cmd.getSort(),
cmd.getFlags());

That is, the filters are part of the key; and the result that's cached
results reflects the application of the filters, and this works great for
what it is probably designed for - supporting paging through results.

So, I think our options are:

- create a new queryComponent that invokes SolrIndexSearcher differently,
and which has its own (short lived but long entry length) cache of the base
query results

- subclass or change SolrIndexSearcher, perhaps making it "pluggable",
perhaps defining an optional new cache of base query results

- create a sublcass of the Lucene IndexSearcher which manages a cache of
query results "hidden" from SolrIndexSearcher (and organise somehow for
SolrIndexSearcher to use that sublass)

Or perhaps Im taking the wrong approach to this problem entirely!  Any
advice is greatly appreciated.

Kent Fitch

Reply via email to