This is a long shot, but look in the overseer queue to see if stuff is stuck. 
We ran into that with 6.x.
We restarted the instance that was the overseer and the newly-elected overseer 
cleared the queue.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 23, 2020, at 10:43 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Yes, you should have seen a new tlog after:
> - a doc was indexed
> - 15 minutes had passed
> - another doc was indexed
> 
> Well, yes, a leader can be in recovery. It looks like this:
> 
> - You’re indexing and docs are written to the tlog.
> - Solr un-gracefully shuts down so the segments haven’t been closed. Note, 
> these are thrown away on restart.
> - Solr is restarted and starts replaying the tlog.
> 
> But, the node shouldn’t be active during this time.
> 
> Of course it’s possible that for some strange reason, the tlog gets set to 
> the buffering state and never gets back to active, which is what the message 
> you posted seems to be indicating.
> 
> So I’m puzzled, let us know what you find…
> 
> Erick
> 
>> On Jul 23, 2020, at 12:56 PM, Gael Jourdan-Weil 
>> <gael.jourdan-w...@kelkoogroup.com> wrote:
>> 
>>> Note that for my previous e-mail you’d have to wait 15 minutes after you 
>>> started indexing to see a new tlog and also wait until at least 1,000 new 
>>> document after _that_ before the large tlog went away. I don't think that’s 
>>> your issue though.
>> Indeed I did wait 15 minutes but not sure 1000 documents were indexed in the 
>> meantime. Though I should've seen a new tlog even if the large one was still 
>> there, right?
>> 
>>> So I think that’s the place to focus. Did the node recover completely and 
>>> go active? Just checking the admin UI and seeing it be green is sometimes 
>>> not enough. Check the state.json znode and see if the state is also 
>>> “active” there.
>> On ZooKeeper (through the Solr UI or directly connecting to ZK) I can see 
>> "state":"active" in the state.json. This seems fine.
>> To be more weird, this is the leader node. Can a leader be in recovery??
>> 
>>> Next, try sending a request directly to that replica. Frankly I’m not sure 
>>> what to expect, but if you get something weird that’d be a “smoking gun” 
>>> that no matter what the admin UI says, the replica isn’t really active. 
>>> Something like “http://blah blah 
>>> blah/solr/collection1_shard1_replica_n1?q=some_query&distrib=false. The 
>>> “distrib=false” is important, otherwise the request will be forwarded to a 
>>> truly active node.
>> The request works fine, I don't see anything weird at that time in the logs.
>> 
>> I will investigate further and take a look at all what you mentionned.
>> 
>> Kind regards,
>> Gaël
> 

Reply via email to