Re: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Karl Wright
Hi,

You say this is a "Tika error".  Is this Tika as a stand-alone service?  I
do not recognize any ManifoldCF classes whatsoever in this thread dump.

If this is Tika, I suggest contacting the Tika team.

Karl


On Thu, Sep 30, 2021 at 3:02 AM Bisonti Mario 
wrote:

> Additional info.
>
>
>
> I am using 2.17-dev version
>
>
>
>
>
>
>
> *Da:* Bisonti Mario
> *Inviato:* martedì 28 settembre 2021 17:01
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Error: Repeated service interruptions - failure processing
> document: Read timed out
>
>
>
> Hello
>
>
>
> I have error on a Job that parses a network folder.
>
>
>
> This is the tika error:
> 2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
>
> 2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
>
> 2021-09-28 16:14:50 INFO  ContextHandler:916 - Started
> o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}
>
> 2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server
> at http://sengvivv02.vimar.net:9998/
>
> 2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
>
> 2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
>
> 2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
>
> 2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
>
> 2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for {
> http://resource.server.tika.apache.org/}MetadataResource has thrown
> exception, unwinding now
>
> org.apache.cxf.interceptor.Fault: Could not send Message.
>
> at
> org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
>
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
>
> at
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
>
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
>
> at
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>
> at
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
>
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
>
> at
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
>
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
>
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
>
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
>
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
>
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>
> at org.eclipse.jetty.server.Server.handle(Server.java:516)
>
> at
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
>
> at
> org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
>
> at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
>
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
>
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
>
> at
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
>
> at
> org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
>
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
>
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
>
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> Caused by: org.eclipse.jetty.io.EofException
>
> at
> org.eclipse.jetty.io.ChannelEndPoint.flush(Chann

R: Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-30 Thread Bisonti Mario
Additional info.

I am using 2.17-dev version



Da: Bisonti Mario
Inviato: martedì 28 settembre 2021 17:01
A: user@manifoldcf.apache.org
Oggetto: Error: Repeated service interruptions - failure processing document: 
Read timed out

Hello

I have error on a Job that parses a network folder.

This is the tika error:
2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
2021-09-28 16:14:50 INFO  ContextHandler:916 - Started 
o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}<mailto:o.e.j.s.h.ContextHandler@3dd69f5a%7b/,null,AVAILABLE%7d>
2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server at 
http://sengvivv02.vimar.net:9998/
2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for 
{http://resource.server.tika.apache.org/}MetadataResource has thrown exception, 
unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
at 
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
at 
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:826)
at 
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at 
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:550)
at 
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:915)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:987)
at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
at org.eclipse.jetty.server.HttpOutput

Error: Repeated service interruptions - failure processing document: Read timed out

2021-09-28 Thread Bisonti Mario
Hello

I have error on a Job that parses a network folder.

This is the tika error:
2021-09-28 16:14:50 INFO  Server:415 - Started @1367ms
2021-09-28 16:14:50 WARN  ContextHandler:1671 - Empty contextPath
2021-09-28 16:14:50 INFO  ContextHandler:916 - Started 
o.e.j.s.h.ContextHandler@3dd69f5a{/,null,AVAILABLE}
2021-09-28 16:14:50 INFO  TikaServerCli:413 - Started Apache Tika server at 
http://sengvivv02.vimar.net:9998/
2021-09-28 16:15:04 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:26:46 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:23 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:24 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:27:26 INFO  MetadataResource:484 - meta (application/pdf)
2021-09-28 16:27:26 INFO  TikaResource:484 - tika (application/pdf)
2021-09-28 16:30:28 WARN  PhaseInterceptorChain:468 - Interceptor for 
{http://resource.server.tika.apache.org/}MetadataResource has thrown exception, 
unwinding now
org.apache.cxf.interceptor.Fault: Could not send Message.
at 
org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:67)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277)
at 
org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
at 
org.eclipse.jetty.server.HttpConnection$SendCallback.process(HttpConnection.java:826)
at 
org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at 
org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:550)
at 
org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:915)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:987)
at org.eclipse.jetty.server.HttpOutput.channelWrite(HttpOutput.java:285)
at org.eclipse.jetty.server.HttpOutput.close(HttpOutput.java:638)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination$JettyOutputStream.close(JettyHTTPDestination.java:329)
at 

Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-07 Thread Ronny Heylen
 there's a service
 interruption.  You would probably see Read timed out warnings if you
 looked there, since that is what aborted the job run, along with a stack
 trace.  However, that's not going to add much information to the analysis
 at this point.

 What might be valuable is to determine whether the problem is happening on
 the Windows side or on the Solr side.  At this point I can't tell.  You
 could, however, create a null output connection, and create  a similar job
 the sends its output there, and see if it completes.  Can you do this and
 get back to me?

 Thanks,
 Karl





 On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen securaqbere...@gmail.comwrote:

 Hi,
 We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
 several hundred thousands documents.
 Doing only one manifoldcf job to index all the drive was always giving
 some kind of error, therefore to better understand where the problem can
 be, we made one job to index all *.doc*, another one for *.xls*, another
 one for *.pdf ...
 Using the help from the list (thanks!) we set the size limit to 100MB and
 all jobs succeeds (great) except the one for *.pptx
 The message is
 Error: Repeated service interruptions - failure processing document: Read
 timed out
 We don't find any error in the log we have searched: solr.log, ...
 Based on some indications found on Internet, we have set the Throttling
 max connections setting to 2 (instead of 10) in 3 places:
 output connection to SOLR
 authority connection to the Active Directory
 repository connection to the windows file share
 But the problem stays the same.
 We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
 problem.
 We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS*
 without problem, but the same message comes always for *.PPTX.
 The last time the job stops with the message, it displays (not the same
 numbers for each run as the windows drive is changing) 56311 documents,
 with 17466 busy and 38847 processed.
 As we don't find anything in the log (but probably we don't look at the
 correct place), we don't know what to do.
 Thanks for your help,
 Ronny and Frédéric





Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-07 Thread Karl Wright
)
 at
 org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
  WARN 2013-11-07 15:06:04,235 (Worker thread '9') - Service interruption
 reported for job 1383765534700 connection 'Filesharesrv1': IO exception
 during indexing: Read timed out



 On Wed, Nov 6, 2013 at 9:28 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Ronny,

 One minor thing: you should need to set throttling to 2 ONLY for the
 Windows repository connection, not for AD or Solr.


 As for how to debug this issue, first off you should be looking in the
 manifoldcf.log file (or the equivalent).  You should see WARN messages from
 the shared file connector under most conditions when there's a service
 interruption.  You would probably see Read timed out warnings if you
 looked there, since that is what aborted the job run, along with a stack
 trace.  However, that's not going to add much information to the analysis
 at this point.

 What might be valuable is to determine whether the problem is happening
 on the Windows side or on the Solr side.  At this point I can't tell.  You
 could, however, create a null output connection, and create  a similar job
 the sends its output there, and see if it completes.  Can you do this and
 get back to me?

 Thanks,
 Karl





 On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen securaqbere...@gmail.comwrote:

 Hi,
 We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
 several hundred thousands documents.
 Doing only one manifoldcf job to index all the drive was always giving
 some kind of error, therefore to better understand where the problem can
 be, we made one job to index all *.doc*, another one for *.xls*, another
 one for *.pdf ...
 Using the help from the list (thanks!) we set the size limit to 100MB
 and all jobs succeeds (great) except the one for *.pptx
 The message is
 Error: Repeated service interruptions - failure processing document:
 Read timed out
 We don't find any error in the log we have searched: solr.log, ...
 Based on some indications found on Internet, we have set the Throttling
 max connections setting to 2 (instead of 10) in 3 places:
 output connection to SOLR
 authority connection to the Active Directory
 repository connection to the windows file share
 But the problem stays the same.
 We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
 problem.
 We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS*
 without problem, but the same message comes always for *.PPTX.
 The last time the job stops with the message, it displays (not the same
 numbers for each run as the windows drive is changing) 56311 documents,
 with 17466 busy and 38847 processed.
 As we don't find anything in the log (but probably we don't look at the
 correct place), we don't know what to do.
 Thanks for your help,
 Ronny and Frédéric






Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-07 Thread Ronny Heylen
)
 at
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at
 org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
 at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
 at
 org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
  WARN 2013-11-07 15:06:04,235 (Worker thread '9') - Service interruption
 reported for job 1383765534700 connection 'Filesharesrv1': IO exception
 during indexing: Read timed out



 On Wed, Nov 6, 2013 at 9:28 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Ronny,

 One minor thing: you should need to set throttling to 2 ONLY for the
 Windows repository connection, not for AD or Solr.


 As for how to debug this issue, first off you should be looking in the
 manifoldcf.log file (or the equivalent).  You should see WARN messages from
 the shared file connector under most conditions when there's a service
 interruption.  You would probably see Read timed out warnings if you
 looked there, since that is what aborted the job run, along with a stack
 trace.  However, that's not going to add much information to the analysis
 at this point.

 What might be valuable is to determine whether the problem is happening
 on the Windows side or on the Solr side.  At this point I can't tell.  You
 could, however, create a null output connection, and create  a similar job
 the sends its output there, and see if it completes.  Can you do this and
 get back to me?

 Thanks,
 Karl





 On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen 
 securaqbere...@gmail.comwrote:

 Hi,
 We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
 several hundred thousands documents.
 Doing only one manifoldcf job to index all the drive was always giving
 some kind of error, therefore to better understand where the problem can
 be, we made one job to index all *.doc*, another one for *.xls*, another
 one for *.pdf ...
 Using the help from the list (thanks!) we set the size limit to 100MB
 and all jobs succeeds (great) except the one for *.pptx
 The message is
 Error: Repeated service interruptions - failure processing document:
 Read timed out
 We don't find any error in the log we have searched: solr.log, ...
 Based on some indications found on Internet, we have set the Throttling
 max connections setting to 2 (instead of 10) in 3 places:
 output connection to SOLR
 authority connection to the Active Directory
 repository connection to the windows file share
 But the problem stays the same.
 We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
 problem.
 We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS*
 without problem, but the same message comes always for *.PPTX.
 The last time the job stops with the message, it displays (not the same
 numbers for each run as the windows drive is changing) 56311 documents,
 with 17466 busy and 38847 processed.
 As we don't find anything in the log (but probably we don't look at the
 correct place), we don't know what to do.
 Thanks for your help,
 Ronny and Frédéric







Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-06 Thread Ronny Heylen
Hi,
We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
several hundred thousands documents.
Doing only one manifoldcf job to index all the drive was always giving some
kind of error, therefore to better understand where the problem can be, we
made one job to index all *.doc*, another one for *.xls*, another one for
*.pdf ...
Using the help from the list (thanks!) we set the size limit to 100MB and
all jobs succeeds (great) except the one for *.pptx
The message is
Error: Repeated service interruptions - failure processing document: Read
timed out
We don't find any error in the log we have searched: solr.log, ...
Based on some indications found on Internet, we have set the Throttling max
connections setting to 2 (instead of 10) in 3 places:
output connection to SOLR
authority connection to the Active Directory
repository connection to the windows file share
But the problem stays the same.
We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
problem.
We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS* without
problem, but the same message comes always for *.PPTX.
The last time the job stops with the message, it displays (not the same
numbers for each run as the windows drive is changing) 56311 documents,
with 17466 busy and 38847 processed.
As we don't find anything in the log (but probably we don't look at the
correct place), we don't know what to do.
Thanks for your help,
Ronny and Frédéric


Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-06 Thread Karl Wright
Hi Ronny,

One minor thing: you should need to set throttling to 2 ONLY for the
Windows repository connection, not for AD or Solr.


As for how to debug this issue, first off you should be looking in the
manifoldcf.log file (or the equivalent).  You should see WARN messages from
the shared file connector under most conditions when there's a service
interruption.  You would probably see Read timed out warnings if you
looked there, since that is what aborted the job run, along with a stack
trace.  However, that's not going to add much information to the analysis
at this point.

What might be valuable is to determine whether the problem is happening on
the Windows side or on the Solr side.  At this point I can't tell.  You
could, however, create a null output connection, and create  a similar job
the sends its output there, and see if it completes.  Can you do this and
get back to me?

Thanks,
Karl





On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen securaqbere...@gmail.comwrote:

 Hi,
 We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
 several hundred thousands documents.
 Doing only one manifoldcf job to index all the drive was always giving
 some kind of error, therefore to better understand where the problem can
 be, we made one job to index all *.doc*, another one for *.xls*, another
 one for *.pdf ...
 Using the help from the list (thanks!) we set the size limit to 100MB and
 all jobs succeeds (great) except the one for *.pptx
 The message is
 Error: Repeated service interruptions - failure processing document: Read
 timed out
 We don't find any error in the log we have searched: solr.log, ...
 Based on some indications found on Internet, we have set the Throttling
 max connections setting to 2 (instead of 10) in 3 places:
 output connection to SOLR
 authority connection to the Active Directory
 repository connection to the windows file share
 But the problem stays the same.
 We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
 problem.
 We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS* without
 problem, but the same message comes always for *.PPTX.
 The last time the job stops with the message, it displays (not the same
 numbers for each run as the windows drive is changing) 56311 documents,
 with 17466 busy and 38847 processed.
 As we don't find anything in the log (but probably we don't look at the
 correct place), we don't know what to do.
 Thanks for your help,
 Ronny and Frédéric



Re: Error: Repeated service interruptions - failure processing document: Read timed out

2013-11-06 Thread Ronny Heylen
Ok Karl, thanks for the tip and the quick response, we will do this and
come back with the result.


On Wed, Nov 6, 2013 at 9:28 PM, Karl Wright daddy...@gmail.com wrote:

 Hi Ronny,

 One minor thing: you should need to set throttling to 2 ONLY for the
 Windows repository connection, not for AD or Solr.


 As for how to debug this issue, first off you should be looking in the
 manifoldcf.log file (or the equivalent).  You should see WARN messages from
 the shared file connector under most conditions when there's a service
 interruption.  You would probably see Read timed out warnings if you
 looked there, since that is what aborted the job run, along with a stack
 trace.  However, that's not going to add much information to the analysis
 at this point.

 What might be valuable is to determine whether the problem is happening on
 the Windows side or on the Solr side.  At this point I can't tell.  You
 could, however, create a null output connection, and create  a similar job
 the sends its output there, and see if it completes.  Can you do this and
 get back to me?

 Thanks,
 Karl





 On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen securaqbere...@gmail.comwrote:

 Hi,
 We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
 several hundred thousands documents.
 Doing only one manifoldcf job to index all the drive was always giving
 some kind of error, therefore to better understand where the problem can
 be, we made one job to index all *.doc*, another one for *.xls*, another
 one for *.pdf ...
 Using the help from the list (thanks!) we set the size limit to 100MB and
 all jobs succeeds (great) except the one for *.pptx
 The message is
 Error: Repeated service interruptions - failure processing document: Read
 timed out
 We don't find any error in the log we have searched: solr.log, ...
 Based on some indications found on Internet, we have set the Throttling
 max connections setting to 2 (instead of 10) in 3 places:
 output connection to SOLR
 authority connection to the Active Directory
 repository connection to the windows file share
 But the problem stays the same.
 We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
 problem.
 We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS*
 without problem, but the same message comes always for *.PPTX.
 The last time the job stops with the message, it displays (not the same
 numbers for each run as the windows drive is changing) 56311 documents,
 with 17466 busy and 38847 processed.
 As we don't find anything in the log (but probably we don't look at the
 correct place), we don't know what to do.
 Thanks for your help,
 Ronny and Frédéric