Thanks Amrith for creating a patch. But the code in the LBHttpSolrClient.java needs to be fixed too, if the for loop to work as intended. Regards Rajeswari
public Rsp request(Req req) throws SolrServerException, IOException { Rsp rsp = new Rsp(); Exception ex = null; boolean isNonRetryable = req.request instanceof IsUpdateRequest || ADMIN_PATHS.contains(req.request.getPath()); List<ServerWrapper> skipped = null; final Integer numServersToTry = req.getNumServersToTry(); int numServersTried = 0; boolean timeAllowedExceeded = false; long timeAllowedNano = getTimeAllowedInNanos(req.getRequest()); long timeOutTime = System.nanoTime() + timeAllowedNano; for (String serverStr : req.getServers()) { if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) { break; } serverStr = normalize(serverStr); // if the server is currently a zombie, just skip to the next one ServerWrapper wrapper = zombieServers.get(serverStr); if (wrapper != null) { // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr); final int numDeadServersToTry = req.getNumDeadServersToTry(); if (numDeadServersToTry > 0) { if (skipped == null) { skipped = new ArrayList<>(numDeadServersToTry); skipped.add(wrapper); } else if (skipped.size() < numDeadServersToTry) { skipped.add(wrapper); } } continue; } try { MDC.put("LBHttpSolrClient.url", serverStr); if (numServersToTry != null && numServersTried > numServersToTry.intValue()) { break; } HttpSolrClient client = makeSolrClient(serverStr); ++numServersTried; ex = doRequest(client, req, rsp, isNonRetryable, false, null); if (ex == null) { return rsp; // SUCCESS } } finally { MDC.remove("LBHttpSolrClient.url"); } } // try the servers we previously skipped if (skipped != null) { for (ServerWrapper wrapper : skipped) { if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) { break; } if (numServersToTry != null && numServersTried > numServersToTry.intValue()) { break; } try { MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL()); ++numServersTried; ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, wrapper.getKey()); if (ex == null) { return rsp; // SUCCESS } } finally { MDC.remove("LBHttpSolrClient.url"); } } } final String solrServerExceptionMessage; if (timeAllowedExceeded) { solrServerExceptionMessage = "Time allowed to handle this request exceeded"; } else { if (numServersToTry != null && numServersTried > numServersToTry.intValue()) { solrServerExceptionMessage = "No live SolrServers available to handle this request:" + " numServersTried="+numServersTried + " numServersToTry="+numServersToTry.intValue(); } else { solrServerExceptionMessage = "No live SolrServers available to handle this request"; } } if (ex == null) { throw new SolrServerException(solrServerExceptionMessage); } else { throw new SolrServerException(solrServerExceptionMessage+":" + zombieServers.keySet(), ex); } } On 5/19/19, 3:12 PM, "Amrit Sarkar" <sarkaramr...@gmail.com> wrote: > > Thanks Natrajan, > > Solid analysis and I saw the issue being reported by multiple users in > past few months and unfortunately I baked an incomplete code. > > I think the correct way of solving this issue is to identify the correct > base-url for the respective core we need to trigger REQUESTRECOVERY to and > create a local HttpSolrClient instead of using CloudSolrClient from > CdcrReplicatorState. This will avoid unnecessary retry which will be > redundant in our case. > > I baked a small patch few weeks back and will upload it on the SOLR-11724 > <https://issues.apache.org/jira/browse/SOLR-11724>. >