I tried:

  1.
update to jcifs-ng-2.1.10.jar (from jcifs-ng-2.1.9.jar)
to do this, I had updated bcprov-jdk15on-1.70.jar to  bcprov-jdk18on-1.79.jar, 
else jcifs-ng-2.1.10.jar wouldn't work.
  2.
I modified tika-config.xml:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <!-- Global timeout setting to prevent premature task termination -->
    <taskTimeoutMillis>300000</taskTimeoutMillis> <!-- Set to 5 minutes -->

    <!-- Enable dynamic service loading -->
    <service-loader dynamic="true"/>

    <!-- Load error handling settings -->
    <service-loader loadErrorHandler="WARN"/>

    <parsers>
        <!-- Default Parser Configuration -->
        <parser class="org.apache.tika.parser.DefaultParser">
            <!-- Exclude parsers that are not needed to reduce processing time 
-->
            <parser-exclude 
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
            <parser-exclude 
class="org.apache.tika.parser.microsoft.OfficeParser"/><!-- Example exclusion 
-->
            <params>
                <param name="suppressExceptions" type="bool">true</param>
                <param name="ignoreTikaErrors" type="bool">true</param>
            </params>
        </parser>

        <!-- Specific Parser Configurations -->
        <parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
            <params>
                <param name="includeShapeBasedContent" type="bool">false</param>
                <param name="suppressExceptions" type="bool">true</param>
            </params>
        </parser>

        <parser class="org.apache.tika.parser.pdf.PDFParser">
            <params>
                <param name="pdfbox.enableAutoSpace" type="bool">true</param>
                <param name="suppressExceptions" type="bool">true</param>
            </params>
        </parser>

        <!-- Additional parsers can be added here as needed -->
    </parsers>

    <!-- Detectors configuration (if needed) -->
    <detectors>
        <detector class="org.apache.tika.detect.DefaultDetector"/>
        <!-- Customize detectors if necessary -->
    </detectors>
</properties>
  3.
The error still persist.
Job manifoldcf end with:
Error: Repeated service interruptions - failure processing document: The target 
server failed to respond

  4.
On the docker logs tika-service I see:
[main] 13:42:36,867 org.apache.tika.server.core.TikaServerProcess Started 
Apache Tika server 42c0849d-7850-43e4-b053-4e93ffd8656b at http://0.0.0.0:9998/
like a restarting of tika process (docker container didn't restart)
  5.
I noticed swap full, so I have just resized to 20GB (from 4G)

Tomorrow I will check if resizing swap helps.



________________________________
Da: Bisonti Mario
Inviato: venerdì 10 gennaio 2025 11:15
A: [email protected] <[email protected]>
Oggetto: Job error: Error: Repeated service interruptions - failure processing 
document: The target server failed to respond

Hi, after migration to ManifoldCF 2.27-dev, Solr 9.7, Tika server 3.0 I have 
this error on a job indexing documents on a windows share:

Error: Repeated service interruptions - failure processing document: The target 
server failed to respond

The error happened after some hours of indexing.

On "Simple History Report" I see many "Result Code"=TIKASERVERREJECTS 

I set on my core_share solrconfig.xml with:
  <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >

 <lst name="defaults">
<!-- Inserisco path per configurazione tika -->
<str name="tika-config">/var/solr/data/core_share/conf/tika.config</str>
<!-- Fine inserimento-->

      <str name="lowernames">true</str>
      <str name="fmap.meta">ignored_</str>
      <str name="fmap.a">links</str>
      <str name="fmap.div">ignored_</str>
      <str name="captureAttr">true</str>
<!-- Mario bisonti 08052018 implemento eccezione Ignore Tika Exception 
altrimenti i ppt proprio non li vede -->
    <bool name="ignoreTikaException">true</bool>

<!-- Fine eccezione Tika Mario Bisonti -->


    </lst>

<!-- FINE ORIGINALE -->

  </requestHandler>

This is my tika.config:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
            <params>
                <param name="includeShapeBasedContent" type="bool">false</param>
            </params>
        </parser>
        <parser class="org.apache.tika.parser.microsoft.OfficeParser">
            <params>
                <param name="includeShapeBasedContent" type="bool">false</param>
            </params>
        </parser>
    </parsers>
</properties>


Is this a tika problem?

Thanks a lot



Mario Bisonti



Reply via email to