Re: Avoiding Transient state during a long running background process

Erick Erickson Wed, 29 Mar 2017 19:10:15 -0700

It's an LRU cache time. See the docs for LinkedHashmap, this form of
the c'tor is used in SolrCores.allocateLazyCores

transientCores = new LinkedHashMap<String,
SolrCore>(Math.min(cacheSize, 1000), 0.75f, true) {

which is a special form of the c'tor that creates an access-ordered map.
I had a terrible moment seeing this line in the code where
transientCores is declared:

protected Map<String, SolrCore> transientCores = new
LinkedHashMap<>(); // For "lazily loaded" cores

which would have created an insertion-ordered LRU cache. Turns out
that this is just a placeholder to keep from having to check if the
transientCores map is null before it's really allocated.

bq: My guess is that it is decided by the load time, because this is
the option that would have the best performance.

Not at all. The theory here is that this is to support the pattern
where some transient cores are used all the time and some cores are
only used for a while then go quiet. E.g. searching an organization's
documents. A large organization might have users searching all day. A
small organization may search the docs once a week.

If it was insertion-order, then those users who signed on and worked
all day would have their cores unloaded periodically even though other
cores were last accessed a long time ago. Of course there will be some
access patterns for which this is a bad assumption.

I'm in the middle of pulling all this out into a pluggable framework,
see SOLR-8906. So if this is truly important in 6.6+ you should be
able to define your own plugin.

Shawn's comments on how to avoid unloading the core are spot on, and
the only options that exist currently.

Your BackupHandler should be OK. The core is reloaded whenever it's
accessed, but since the underlying index hasn't changed (it couldn't
because the core was unloaded!) it should be in the same state it was
in last time you accessed it.

If your custom BackupHandler is not holding the core open or, more
specifically a searcher, then even if the core wasn't unloaded you
have the possibility of the index changing out from underneath you due
to indexing activity between calls and having an inconsistent backup.
Could you use the fetchindex replication API command? See:
https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler.
Solr relies on this "doing the right thing" so that there are
consistent indexes every time, it might save you a lot of grief.

This does work with SolrCloud (I'm not assuming you're using
SolrCloud, just sayin'), but note that the machine being replicated
_to_ (that, BTW, doesn't even have to be part of the collection) won't
be able to serve queries while the replication is going on. I'm
thinking something like use a dummy Solr instance to issue the
fetchindex to _then_ move the result to your Cloud storage.

Best,
Erick

On Wed, Mar 29, 2017 at 4:41 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 3/29/2017 4:50 PM, Shashank Pedamallu wrote:
>> Thank you very much for the response. Is there no definite way of
>> ensuring that Solr does not switch transient states by an api? Like
>> solrCore.open() and solrCore.close()?
>
> I am not aware of any way to tell Solr to NOT unload a core when all of
> the following conditions have been met:
>
> 1) Another transient core must be loaded because it has been accessed.
> 2) The core in question has been marked transient.
> 3) The transientCacheSize has already been reached.
> 4) The core in question is the one with the earliest timestamp.
>
> I checked the code, but could not determine whether the oldest core is
> decided by core load time or by core access time. My guess is that it is
> decided by the load time, because this is the option that would have the
> best performance.
>
> If it's important that this core never gets unloaded, then you'll want
> to remove the transient property.
>
> Thanks,
> Shawn
>

Re: Avoiding Transient state during a long running background process

Reply via email to