Thanks Amrith for creating a patch. But the code in the LBHttpSolrClient.java 
needs to be fixed too, if the for loop  to work as intended.
Regards
Rajeswari

public Rsp request(Req req) throws SolrServerException, IOException {
    Rsp rsp = new Rsp();
    Exception ex = null;
    boolean isNonRetryable = req.request instanceof IsUpdateRequest || 
ADMIN_PATHS.contains(req.request.getPath());
    List<ServerWrapper> skipped = null;

    final Integer numServersToTry = req.getNumServersToTry();
    int numServersTried = 0;

    boolean timeAllowedExceeded = false;
    long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
    long timeOutTime = System.nanoTime() + timeAllowedNano;
    for (String serverStr : req.getServers()) {
      if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
        break;
      }
      
      serverStr = normalize(serverStr);
      // if the server is currently a zombie, just skip to the next one
      ServerWrapper wrapper = zombieServers.get(serverStr);
      if (wrapper != null) {
        // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
        final int numDeadServersToTry = req.getNumDeadServersToTry();
        if (numDeadServersToTry > 0) {
          if (skipped == null) {
            skipped = new ArrayList<>(numDeadServersToTry);
            skipped.add(wrapper);
          }
          else if (skipped.size() < numDeadServersToTry) {
            skipped.add(wrapper);
          }
        }
        continue;
      }
      try {
        MDC.put("LBHttpSolrClient.url", serverStr);

        if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
          break;
        }

        HttpSolrClient client = makeSolrClient(serverStr);

        ++numServersTried;
        ex = doRequest(client, req, rsp, isNonRetryable, false, null);
        if (ex == null) {
          return rsp; // SUCCESS
        }
      } finally {
        MDC.remove("LBHttpSolrClient.url");
      }
    }

    // try the servers we previously skipped
    if (skipped != null) {
      for (ServerWrapper wrapper : skipped) {
        if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) 
{
          break;
        }

        if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
          break;
        }

        try {
          MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
          ++numServersTried;
          ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, 
wrapper.getKey());
          if (ex == null) {
            return rsp; // SUCCESS
          }
        } finally {
          MDC.remove("LBHttpSolrClient.url");
        }
      }
    }


    final String solrServerExceptionMessage;
    if (timeAllowedExceeded) {
      solrServerExceptionMessage = "Time allowed to handle this request 
exceeded";
    } else {
      if (numServersToTry != null && numServersTried > 
numServersToTry.intValue()) {
        solrServerExceptionMessage = "No live SolrServers available to handle 
this request:"
            + " numServersTried="+numServersTried
            + " numServersToTry="+numServersToTry.intValue();
      } else {
        solrServerExceptionMessage = "No live SolrServers available to handle 
this request";
      }
    }
    if (ex == null) {
      throw new SolrServerException(solrServerExceptionMessage);
    } else {
      throw new SolrServerException(solrServerExceptionMessage+":" + 
zombieServers.keySet(), ex);
    }

  }

On 5/19/19, 3:12 PM, "Amrit Sarkar" <sarkaramr...@gmail.com> wrote:

    >
    > Thanks Natrajan,
    >
    > Solid analysis and I saw the issue being reported by multiple users in
    > past few months and unfortunately I baked an incomplete code.
    >
    > I think the correct way of solving this issue is to identify the correct
    > base-url for the respective core we need to trigger REQUESTRECOVERY to and
    > create a local HttpSolrClient instead of using CloudSolrClient from
    > CdcrReplicatorState. This will avoid unnecessary retry which will be
    > redundant in our case.
    >
    > I baked a small patch few weeks back and will upload it on the SOLR-11724
    > <https://issues.apache.org/jira/browse/SOLR-11724>.
    >
    

Reply via email to