Re: How to Prevent Recovery?

2020-09-08 Thread Anshuman Singh
Hi, I noticed that when I created TLOG Replicas using ADDREPLICA API, I called the API parallely for all the shards, because of which all the replicas were created on a single node i.e. replicas were not distributed evenly across the nodes. After fixing that, getting better indexing performance

Re: How to Prevent Recovery?

2020-08-31 Thread Dominique Bejean
Hi, Even if it is not the root cause, I suggest to try to respect some basic best practices and so not have "2 Zk running on the same nodes where Solr is running". Maybe you can achieve this by just stopping these 2 Zk (and move them later). Did you increase ZK_CLIENT_TIMEOUT to 3 ? Did you

Re: How to Prevent Recovery?

2020-08-30 Thread Anshuman Singh
Hi, I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG replicas using the ADDREPLICA API and then deleting the NRT replicas. But now, these replicas are going into recovery even more frequently during indexing. Same errors are observed. Also, commit is taking a lot of time compared

Re: How to Prevent Recovery?

2020-08-25 Thread Erick Erickson
Commits should absolutely not be taking that much time, that’s where I’d focus first. Some sneaky places things go wonky: 1> you have suggester configured that builds whenever there’s a commit. 2> you send commits from the client 3> you’re optimizing on commit 4> you have too much data for your

Re: How to Prevent Recovery?

2020-08-25 Thread Houston Putman
Are you able to use TLOG replicas? That should reduce the time it takes to recover significantly. It doesn't seem like you have a hard need for near-real-time, since slow ingestions are fine. - Houston On Tue, Aug 25, 2020 at 12:03 PM Anshuman Singh wrote: > Hi, > > We have a 10 node (150G

How to Prevent Recovery?

2020-08-25 Thread Anshuman Singh
Hi, We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster with 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on the same nodes where Solr is running. Our use case requires continuous ingestions (updates mostly). If we ingest at 40k records per sec, after