i run tika 2.8.0
it's used to attachment scan for a dovecot imap server it runs on an external (to dovecot) server, on the same lan it's up & running ps ax | grep tika 63506 ? Ssl 0:00 /usr/bin/java -Dpdfbox.fontcache=/var/tika -XX:ParallelGCThreads=1 -XX:CICompilerCount=2 -XX:-CICompilerCountPerCPU -jar /srv/apps/tika/tika-server.jar -c /usr/local/etc/tika/tika-server-config-custom.xml --host 10.1.7.100 --port 9998 63540 ? Sl 0:02 /usr/bin/java -Xms1g -Xmx1g -Dpdfbox.fontcache=/var/tika -Dlog4j2.warn -Djava.awt.headless=true -cp /srv/apps/tika/tika-server.jar -Dtika.server.id= org.apache.tika.server.core.TikaServerProcess -h 10.1.7.100 -p 9998 -i -c /usr/local/etc/tika/tika-server-config-custom.xml -forkedStatusFile /tmp/apache-tika-server-forked-tmp-15836749653669077604 -numRestarts 0 dovecot config for using tika instance is fts_tika = http://10.1.7.100:9998/tika/ testing a local PDF on the tika server F="/tmp/TEST.pdf" /bin/cp -af $F /tmp/test.pdf chown vmail:vmail /tmp/test.pdf curl \ -T /tmp/test.pdf \ http://10.1.7.100:9998/meta <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core Test.SNAPSHOT"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:pdf="http://ns.adobe.com/pdf/1.3/" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" xmlns:xmpTPg="http://ns.adobe.com/xap/1.0/t/pg/" pdf:PDFVersion="1.4" pdf:hasXFA="false" pdf:num3DAnnotations="0" pdf:overallPercentageUnmappedUnicodeChars="0.0" pdf:hasCollection="false" pdf:encrypted="false" pdf:containsNonEmbeddedFont="false" pdf:hasMarkedContent="true" pdf:producer="Adobe PDF Library 15.0" pdf:totalUnmappedUnicodeChars="0" pdf:hasXMP="true" pdf:containsDamagedFont="false" xmp:CreatorTool="Adobe InDesign 15.1 (Macintosh)" dc:format="application/pdf; version=1.4" dc:language="en-US" xmpMM:DocumentID="xmp.id:8a612346-9d03-4caf-8ebf-da6f3716ed0a" xmpTPg:NPages="14"> <pdf:unmappedUnicodeCharsPerPage> <rdf:Seq> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> <rdf:li>0</rdf:li> </rdf:Seq> </pdf:unmappedUnicodeCharsPerPage> <pdf:charsPerPage> <rdf:Seq> <rdf:li>84</rdf:li> <rdf:li>676</rdf:li> <rdf:li>1653</rdf:li> <rdf:li>1914</rdf:li> <rdf:li>814</rdf:li> <rdf:li>1022</rdf:li> <rdf:li>645</rdf:li> <rdf:li>1221</rdf:li> <rdf:li>1087</rdf:li> <rdf:li>732</rdf:li> <rdf:li>887</rdf:li> <rdf:li>1295</rdf:li> <rdf:li>1263</rdf:li> <rdf:li>149</rdf:li> </rdf:Seq> </pdf:charsPerPage> <pdf:annotationTypes> <rdf:Bag> <rdf:li>null</rdf:li> </rdf:Bag> </pdf:annotationTypes> <pdf:annotationSubtypes> <rdf:Bag> <rdf:li>Link</rdf:li> </rdf:Bag> </pdf:annotationSubtypes> </rdf:Description> </rdf:RDF> </x:xmpmeta> passing/processing an email with an *.pdf attachment from dovecot, logs ok, Jul 11 08:12:50 svr003 tika[63540]: INFO [qtp1164394344-41] 09:12:50,042 org.apache.tika.server.core.TikaLoggingFilter Request URI: http://10.1.7.100:9998/tika/ Jul 11 08:12:50 svr003 tika[63540]: INFO [qtp1164394344-41] 09:12:50,043 org.apache.tika.server.core.resource.TikaResource /tika (application/pdf) and results are passed back to dovecot, and scan/index db is updated accordingly but passing/processing an email with an embedded (forwarded as attachment) *.eml, logs the following 'SEVERE' error, Jul 11 08:36:49 svr003 tika[62540]: INFO [qtp1164241227-41] 08:36:49,417 org.apache.tika.server.core.TikaLoggingFilter Request URI: http://10.1.7.100:9998/tika/ Jul 11 08:36:49 svr003 tika[62540]: INFO [qtp1164241227-41] 08:36:49,418 org.apache.tika.server.core.resource.TikaResource /tika (message/rfc822) Jul 11 08:36:49 svr003 tika[62540]: WARN [qtp1164241227-41] 08:36:49,419 org.apache.tika.server.core.resource.TikaResource tika/: Text extraction failed ([0-9961000034519].eml) Jul 11 08:36:49 svr003 tika[62540]: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes Jul 11 08:36:49 svr003 tika[62540]: at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:185) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:152) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:57) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:357) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.tika.server.core.resource.TikaResource.lambda$produceText$1(TikaResource.java:507) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:177) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1651) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:249) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:122) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:84) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:90) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [tika-server-standard-2.8.0.jar:2.8.0] Jul 11 08:36:49 svr003 tika[62540]: at java.lang.Thread.run(Thread.java:833) [?:?] Jul 11 08:36:49 svr003 tika[62540]: Jul 11, 2023 8:36:49 AM org.apache.cxf.jaxrs.utils.JAXRSUtils logMessageHandlerProblem Jul 11 08:36:49 svr003 tika[62540]: SEVERE: Problem with writing the data, class org.apache.tika.server.core.resource.TikaResource$$Lambda$371/0x00000008012ab9e0, ContentType: text/plain iiuc, .eml should be parseable https://tika.apache.org/2.8.0/formats.html#Mail_formats https://tika.apache.org/2.8.0/api/org/apache/tika/parser/mail/RFC822Parser.html is there additional/different config needed for .eml processing ?