Hi Erlend,

What is happening is the following.

(1) Your indexing is failing
(2) Httpclient by default retries 3 times on failure
(3) Between each retry, it resets the input stream, but this is not a
resettable input stream, so that can't work..

Because of (3), the Solr Connector explicitly disables retries, using this code:

    // No retries
    localClient.setHttpRequestRetryHandler(new HttpRequestRetryHandler()
      {
        public boolean retryRequest(
          IOException exception,
          int executionCount,
          HttpContext context)
        {
          return false;
        }

      });


I don't know why that isn't working - it certainly used to.  Perhaps
you could research it.

Fundamentally, though, you have a problem upstream of that - you need
to figure out why the indexing request is failing in the first place.
It's likely to be a socket timeout or connection timeout underneath it
all.

Karl

On Thu, Mar 7, 2013 at 7:34 AM, Erlend Garåsen <[email protected]> wrote:
>
> Hello list,
>
> I'm getting the following error when the web cralwer is trying to post
> documents to Solr 4: IO exception during indexing: null. This happens for
> all indexing attempts and just ends in the following:
>
> --8<--
>  WARN 2013-03-01 19:59:51,360 (Worker thread '0') - Service interruption
> reported for job 1362070726596 connection 'Web crawler': IO exception during
> indexing: null
> ERROR 2013-03-01 19:59:51,378 (Worker thread '0') - Exception tossed:
> Repeated service interruptions - failure processing document: null
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service
> interruptions - failure processing document: null
>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:604)
> Caused by: org.apache.http.client.ClientProtocolException
>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>         at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>         at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:833)
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity.
>         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:695)
>         at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
> --8<--
>
> I'm running version 1.1.1 of MCF deployed on Resin. This does not happen on
> our test server which is equally configured as our prod server, except for
> some security restrictions. Basic auth is configured for both reading and
> writing on the Solr server.
>
> I *did* got the same error the first time I deployed version 1.1.1 of MCF on
> our test server, but it went away after I added the Solr core name in the
> core/collection name field. On our production server I *do* have the core
> named configured, so now I need help in order to figure out what's going on.
>
> The NonRepeatableRequestException is perhaps caused by a misconfiguration of
> HttpClient 4, but I'm not sure this is the root of the problem I'm facing
> here. It might be due to the basic auth restriction  which is configured.
> Anyway, this was not a problem for previous versions of MCF.
>
> Erlend
>
> --
> Erlend Garåsen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Reply via email to