Thanks, Karl!

I will first try to set the exact authentication restrictions we have on our prod server on our test server. If I get the same errors on our test server after I have changed the security settings, we may exclude some other possibilities.

Then it might be a good idea to turn off the retries. I have played around with HttpClient before and enabled this, so I think I know how to proceed. I will notify you.

Erlend

On 07.03.13 14.00, Karl Wright wrote:
FWIW, to clarify, I think you are going to be best served by trying to
first turn off the retries (however that can be done, since the
current code is apparently insufficient), and then posting what the
real underlying problem seems to be.  Alternatively, it is possible
that there's already another exception dumped into the log that you
didn't include which would be helpful.  If you need to figure out why
the retries are still happening you may wind up needing to build the
httpclient jar yourself, after adding appropriate diagnostics around
the retry logic.  I'd be happy to work with you on this but probably
not until this evening Boston time.

Karl

On Thu, Mar 7, 2013 at 7:43 AM, Karl Wright <[email protected]> wrote:
Hi Erlend,

What is happening is the following.

(1) Your indexing is failing
(2) Httpclient by default retries 3 times on failure
(3) Between each retry, it resets the input stream, but this is not a
resettable input stream, so that can't work..

Because of (3), the Solr Connector explicitly disables retries, using this code:

     // No retries
     localClient.setHttpRequestRetryHandler(new HttpRequestRetryHandler()
       {
         public boolean retryRequest(
           IOException exception,
           int executionCount,
           HttpContext context)
         {
           return false;
         }

       });


I don't know why that isn't working - it certainly used to.  Perhaps
you could research it.

Fundamentally, though, you have a problem upstream of that - you need
to figure out why the indexing request is failing in the first place.
It's likely to be a socket timeout or connection timeout underneath it
all.

Karl

On Thu, Mar 7, 2013 at 7:34 AM, Erlend Garåsen <[email protected]> wrote:

Hello list,

I'm getting the following error when the web cralwer is trying to post
documents to Solr 4: IO exception during indexing: null. This happens for
all indexing attempts and just ends in the following:

--8<--
  WARN 2013-03-01 19:59:51,360 (Worker thread '0') - Service interruption
reported for job 1362070726596 connection 'Web crawler': IO exception during
indexing: null
ERROR 2013-03-01 19:59:51,378 (Worker thread '0') - Exception tossed:
Repeated service interruptions - failure processing document: null
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service
interruptions - failure processing document: null
         at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:604)
Caused by: org.apache.http.client.ClientProtocolException
         at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
         at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
         at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
         at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
         at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
         at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
         at
org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:833)
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity.
         at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:695)
         at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
--8<--

I'm running version 1.1.1 of MCF deployed on Resin. This does not happen on
our test server which is equally configured as our prod server, except for
some security restrictions. Basic auth is configured for both reading and
writing on the Solr server.

I *did* got the same error the first time I deployed version 1.1.1 of MCF on
our test server, but it went away after I added the Solr core name in the
core/collection name field. On our production server I *do* have the core
named configured, so now I need help in order to figure out what's going on.

The NonRepeatableRequestException is perhaps caused by a misconfiguration of
HttpClient 4, but I'm not sure this is the root of the problem I'm facing
here. It might be due to the basic auth restriction  which is configured.
Anyway, this was not a problem for previous versions of MCF.

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Reply via email to