Re: tlog keeps growing

Erick Erickson Thu, 23 Jul 2020 10:43:20 -0700

Yes, you should have seen a new tlog after:
- a doc was indexed
- 15 minutes had passed
- another doc was indexed


Well, yes, a leader can be in recovery. It looks like this:

- You’re indexing and docs are written to the tlog.
- Solr un-gracefully shuts down so the segments haven’t been closed. Note, 
these are thrown away on restart.
- Solr is restarted and starts replaying the tlog.

But, the node shouldn’t be active during this time.

Of course it’s possible that for some strange reason, the tlog gets set to the 
buffering state and never gets back to active, which is what the message you 
posted seems to be indicating.

So I’m puzzled, let us know what you find…

Erick

> On Jul 23, 2020, at 12:56 PM, Gael Jourdan-Weil 
> <gael.jourdan-w...@kelkoogroup.com> wrote:
> 
>> Note that for my previous e-mail you’d have to wait 15 minutes after you 
>> started indexing to see a new tlog and also wait until at least 1,000 new 
>> document after _that_ before the large tlog went away. I don't think that’s 
>> your issue though.
> Indeed I did wait 15 minutes but not sure 1000 documents were indexed in the 
> meantime. Though I should've seen a new tlog even if the large one was still 
> there, right?
> 
>> So I think that’s the place to focus. Did the node recover completely and go 
>> active? Just checking the admin UI and seeing it be green is sometimes not 
>> enough. Check the state.json znode and see if the state is also “active” 
>> there.
> On ZooKeeper (through the Solr UI or directly connecting to ZK) I can see 
> "state":"active" in the state.json. This seems fine.
> To be more weird, this is the leader node. Can a leader be in recovery??
> 
>> Next, try sending a request directly to that replica. Frankly I’m not sure 
>> what to expect, but if you get something weird that’d be a “smoking gun” 
>> that no matter what the admin UI says, the replica isn’t really active. 
>> Something like “http://blah blah 
>> blah/solr/collection1_shard1_replica_n1?q=some_query&distrib=false. The 
>> “distrib=false” is important, otherwise the request will be forwarded to a 
>> truly active node.
> The request works fine, I don't see anything weird at that time in the logs.
> 
> I will investigate further and take a look at all what you mentionned.
> 
> Kind regards,
> Gaël

Re: tlog keeps growing

Reply via email to