Hi,

recently we're experiencing OOMEs (GC overhead limit exceeded) in our
searches. Therefore I want to get some clarification on heap and cache
configuration.

This is the situation:
- Solr 1.4.1 running on tomcat 6, Sun JVM 1.6.0_13 64bit
- JVM Heap Params: -Xmx8G -XX:MaxPermSize=256m -XX:NewSize=2G
-XX:MaxNewSize=2G -XX:SurvivorRatio=6 -XX:+UseParallelOldGC
-XX:+UseParallelGC
- The machine has 32 GB RAM
- Currently there are 4 processors/cores in the machine, this shall be
changed to 2 cores in the future.
- The index size in the filesystem is ~9.5 GB
- The index contains ~ 5.500.000 documents
- 1.500.000 of those docs are available for searches/queries, the rest are
inactive docs that are excluded from searches (via a flag/field), but
they're still stored in the index as need to be available by id (solr is the
main document store in this app)
- Caches are configured with a big size (the idea was to prevent filesystem
access / disk i/o as much as possible):
  - filterCache (solr.LRUCache): size=200000, initialSize=30000,
autowarmCount=1000, actual size =~ 60.000, hitratio =~ 0.99
  - documentCache (solr.LRUCache): size=200000, initialSize=100000,
autowarmCount=0, actual size =~ 160.000 - 190.000, hitratio =~ 0.74
  - queryResultCache (solr.LRUCache): size=200000, initialSize=30000,
autowarmCount=10000, actual size =~ 10.000 - 60.000, hitratio =~ 0.71
- Searches are performed using a catchall text field using standard request
handler, all fields are fetched (no fl specified)
- Normally ~ 5 concurrent requests, peaks up to 30 or 40 (mostly during GC)
- Recently we also added a feature that adds weighted search for special
fields, so that the query might become s.th. like this
  q=(some query) OR name_weighted:(some query)^2.0 OR brand_weighted:(some
query)^4.0 OR longDescription_weighted:(some query)^0.5
  (it seemed as if this was the cause of the OOMEs, but IMHO it only
increased RAM usage so that now GC could not free enough RAM)

The OOMEs that we get are of type "GC overhead limit exceeded", one of the
OOMEs was thrown during auto-warming.

I checked two different heapdumps, the first one autogenerated
(by -XX:+HeapDumpOnOutOfMemoryError) the second one generated manually via
jmap.
These show the following distribution of used memory - the autogenerated
dump:
 - documentCache: 56% (size ~ 195.000)
- filterCache: 15% (size ~ 60.000)
- queryResultCache: 8% (size ~ 61.000)
- fieldCache: 6% (fieldCache referenced  by WebappClassLoader)
- SolrIndexSearcher: 2%

The manually generated dump:
- documentCache: 48% (size ~ 195.000)
- filterCache: 20% (size ~ 60.000)
- fieldCache: 11% (fieldCache hängt am WebappClassLoader)
- queryResultCache: 7% (size ~ 61.000)
- fieldValueCache: 3%

We are also running two search engines with 17GB heap, these don't run into
OOMEs. Though, with these bigger heap sizes the longest requests are even
longer due to longer stop-the-world gc cycles.
Therefore my goal is to run with a smaller heap, IMHO even smaller than 8GB
would be good to reduce the time needed for full gc.

So what's the right path to follow now? What would you recommend to change
on the configuration (solr/jvm)?

Would you say it is ok to reduce the cache sizes? Would this increase disk
i/o, or would the index be hold in the OS's disk cache?

Do have other recommendations to follow / questions?

Thanx && cheers,
Martin

Reply via email to