On 8/16/2018 7:14 PM, Michael Hu (CMBU) wrote:
Environment:

   *   solr 7.4.1
   *   all cores are vanilla cores with "loadOnStartUp" set to false, and 
"transient" set to true
   *   we have about 75 cores with "transientCacheSize" set to 32

Issue: we have core corruption from time to time (2-3 core corruption a day)

How to reproduce:

   *   Set the "transientCacheSize" to 1
   *   Ingest high load to core1 only (no issue at this time)
   *   Continue ingest high load to core1 and start ingest load to core2 
simultaneously (core2 immediately corrupted) (stack trace is attached below)

If a core gets unloaded while you're sending data to it, operation is probably unpredictable.  Core corruption isn't good, but I'm not surprised that it happens in this scenario.

Your transientCacheSize must allow all cores which are getting updates to be in memory at the same time, so unless that's all of your cores, the number should probably be larger than the number of cores getting updates, so you can query other cores simultaneously.

Thanks,
Shawn

Reply via email to