Hi,

I’m involved in the an open source project called Vufind which uses Solr to
search across library catalogue records [1].

The project uses what seems to be very high defaults cache settings in
solrconfig.xml [2]:

   -

   filterCache (size="300000" initialSize="300000" autowarmCount="50000"),
   -

   queryResultCache (size="100000" initialSize="100000"
   autowarmCount="50000"),
   -

   documentCache (size="50000" initialSize="50000").


These settings haven’t been reviewed since early in the project history (c.
2007) but came up in a recent discussion around out-of-memory issues and
garbage collection.

Of course decisions on cache configuration (along with jvm settings,
sharding etc) vary depending on the instance (index size, query/sec etc),
but I wanted to run these values past this list as a sanity check for what
you’d consider good default settings giving that most adopters of the
software will not touch the defaults.

Some characteristics of library data & Vufind’s schema [3] which may have a
bearing on the issue:

   -

   quite a few facet fields & filtering (~ 12 facets configured by default)
   -

   high number of unique facet values (e.g. several hundred-thousands in a
   facet field for authors or subjects)
   -

   most libraries would do only one or two incremental commits a day (which
   may justify high auto-warming settings since the next commit isn’t for 24
   hours)
   -

   sorting: relevance by default but other options configured by default
   (title, author, callnumber, year, etc)
   -

   mostly, small sparse documents (MARC records containing title, author,
   desciption etc but no full-text content)
   -

   quite a few stored fields, including a field which stores the full MARC
   record for additional parsing by the application
   -

   average number of documents for most adopters probably somewhere between
   500K and 2 million MARC records (Vufind has several adopters with up to 50m
   full-text docs but these make considerable customisations their Solr setup)
   - query/sec will vary from library to library, but shouldn't be anything
   too taxing for most adopters


Do the current cache settings make sense in this context, or should we
consider dropping back to the much lower values given in the Solr example
and wiki?

Many thanks

Eoghan


[1] vufind.org

[2]
https://github.com/vufind-org/vufind/blob/master/solr/biblio/conf/solrconfig.xml
[3]
https://github.com/vufind-org/vufind/blob/master/solr/biblio/conf/schema.xml

Reply via email to