As a potential solution, I was wondering about implementing Master/Slave 
replication using the collection name of the Master rather than the core name. 
My initial experiment with this in a test environment seemed to work. Does 
anyone have any input on the idea of using the Master's collection name in 
Master/Slave replication, rather than the core name?

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[email protected]> 
Sent: Wednesday, June 02, 2021 5:46 PM
To: [email protected]
Subject: RE: Cores renamed

It happened again this morning.

Attached is an excerpt from solr.log (with port #s & IP addresses redacted) and 
below is the current CLUSTERSTATUS (with port #s redacted)

Is there yet any explanation?

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node8":{
                "core":"ipg_report_large_shard1_replica_n7",
                "base_url":"http://solrdbprod26.be-md:####/solr";,
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"},
              "core_node10":{
                "core":"ipg_report_large_shard1_replica_n9",
                "base_url":"http://solrdbprod25.be-md:####/solr";,
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":741,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[email protected]> 
Sent: Monday, May 17, 2021 5:01 PM
To: [email protected]
Subject: RE: Cores renamed

The entire directory for the old core gets removed

Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS 
on both nodes, and the only difference was QTime (that is, there was no real 
difference):

{
  "responseHeader":{
    "status":0,
    "QTime":5},
  "cluster":{
    "collections":{
      "ipg_report_large":{
        "pullReplicas":"0",
        "replicationFactor":"1",
        "shards":{"shard1":{
            "range":"80000000-7fffffff",
            "state":"active",
            "replicas":{
              "core_node4":{
                "core":"ipg_report_large_shard1_replica_n3",
                "base_url":"http://solrdbprod26.be-md:####/solr";,
                "node_name":"solrdbprod26.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false"},
              "core_node6":{
                "core":"ipg_report_large_shard1_replica_n5",
                "base_url":"http://solrdbprod25.be-md:####/solr";,
                "node_name":"solrdbprod25.be-md:####_solr",
                "state":"active",
                "type":"NRT",
                "force_set_state":"false",
                "leader":"true"}}}},
        "router":{"name":"compositeId"},
        "maxShardsPerNode":"1",
        "autoAddReplicas":"false",
        "nrtReplicas":"1",
        "tlogReplicas":"0",
        "znodeVersion":710,
        "configName":"ipg_report_large"}},
    "live_nodes":["solrdbprod26.be-md:####_solr",
      "solrdbprod25.be-md:####_solr"]}}

-----Original Message-----
From: matthew sporleder <[email protected]> 
Sent: Monday, May 17, 2021 4:34 PM
To: [email protected]
Subject: Re: Cores renamed

Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
<[email protected]> wrote:
>
> > What does the core renames itself to, that would probably be the biggest 
> > hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core 
> ipg_report_large_shard1_replica_n1 and to create the core 
> ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its own to drop the core 
> ipg_report_large_shard1_replica_n5 (on another node of the same SolrCloud) 
> and to create the core ipg_report_large_shard1_replica_n9 in its place
>
> At about 4:10pm 1/26/21, Solr decided on its own to drop this core 
> ipg_report_large_shard1_replica_n9 and to create the core 
> ipg_report_large_shard1_replica_n13 in its place
>
> In March, we created a new SolrCloud for the same collection, and reloaded 
> the data
>
> At 7:59am 5/12/21, Solr decided on its own to drop the core 
> ipg_report_large_shard1_replica_n1 and to create the core 
> ipg_report_large_shard1_replica_n5 in its place
>
> I am attaching an excerpt from solr.log for the most recent problem (with IP 
> addresses and port numbers redacted)
>
> Please not that Master/Slave replication breaks when a core is renamed, so 
> this can be a major problem
>
>
> Any ideas?
>
> -----Original Message-----
> From: Alexandre Rafalovitch <[email protected]>
> Sent: Wednesday, May 12, 2021 2:10 PM
> To: [email protected]
> Subject: Re: Cores renamed
>
> This is truly a shot in the dark, but is it possible you have
> something in core.properties file (which is where the core name is for
> non-Cloud setup)?
>
> What does the core renames itself to, that would probably be the biggest hint.
>
> Regards,
>    Alex.
>
> On Wed, 12 May 2021 at 14:00, Oakley, Craig (NIH/NLM/NCBI) [C]
> <[email protected]> wrote:
> >
> > This phenomenon has happened again (this time without any REQUESTRECOVERY)
> >
> > Does anyone yet have any explanation of this?
> >
> > -----Original Message-----
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] <[email protected]>
> > Sent: Thursday, January 28, 2021 10:57 AM
> > To: [email protected]
> > Subject: Cores renamed
> >
> > We recently have had a few occasions when cores for one specific collection 
> > were renamed (or more likely dropped and recreated, and thus ended up with 
> > a different core name).
> >
> > Is this a known phenomenon? Is there any explanation?
> >
> > It may be relevant that we just recently started running this SolrCloud on 
> > version 8.5.2, although the collection was created under Solr7.4. Also, 
> > this collection seems to experience some heavy updates such that the 
> > non-Leader replica has trouble keeping up. One of these renames occurred at 
> > 4:33am, so I highly suspect that the rename (or drop and recreate) was done 
> > by some internal Solr thread rather than by any of my coworkers. One other 
> > potential clue is that I can see that 
> > /solr/admin/cores?action=REQUESTRECOVERY was usually run on the new core a 
> > moment after it was created.
> >
> > Does anyone have any insights?

Reply via email to