The media type of *-ooxml normally happens when the standard parsers are
not on the class path or if the file is damaged. Unless you hooked up the
debugger with an agent, I don't think you'd see what is actually loaded in
practice. Don't forget that the parser is loaded in the forked process. So,
make sure that the standard parsers are on the classpath of the forked
process?

Apologies if I've misunderstood something.


On Tue, Oct 22, 2024 at 11:21 PM Nicholas DiPiazza <
[email protected]> wrote:

> I have a parse service
>
>
> https://github.com/nddipiazza/tika-pipes/blob/main/tika-pipes-core/src/main/java/org/apache/tika/pipes/parser/ParseService.java
>
> When i feed it some files, it always returns with
>
> 22:05:19.005 [main] INFO pipes.FileSystemFetcher -- Fetched:
> success=[fetch_key: "/home/ndipiazza/Downloads/docx/596206.docx"
> fields {
>   key: "Content-Type"
>   value: "application/x-tika-ooxml"
> }
> fields {
>   key: "X-TIKA:Parsed-By"
>   value: "org.apache.tika.parser.EmptyParser"
> }
>
> I see the parsers in the intellij:
>
> AutoDetectParserConfig{spoolToDisk=null, outputThreshold=null,
> maximumCompressionRatio=null, maximumDepth=null,
> maximumPackageEntryDepth=null, metadataWriteFilterFactory=null,
> embeddedDocumentExtractorFactory=org.apache.tika.extractor.ParsingEmbeddedDocumentExtractorFactory@1ddeed05,
> contentHandlerDecoratorFactory=org.apache.tika.parser.AutoDetectParserConfig$1@52f4a530,
> digesterFactory=null, throwOnZeroBytes=true}
>
> [image: image.png]
>
> Do I need to send metadata along with the request?
>

Reply via email to