The media type of *-ooxml normally happens when the standard parsers are not on the class path or if the file is damaged. Unless you hooked up the debugger with an agent, I don't think you'd see what is actually loaded in practice. Don't forget that the parser is loaded in the forked process. So, make sure that the standard parsers are on the classpath of the forked process?
Apologies if I've misunderstood something. On Tue, Oct 22, 2024 at 11:21 PM Nicholas DiPiazza < [email protected]> wrote: > I have a parse service > > > https://github.com/nddipiazza/tika-pipes/blob/main/tika-pipes-core/src/main/java/org/apache/tika/pipes/parser/ParseService.java > > When i feed it some files, it always returns with > > 22:05:19.005 [main] INFO pipes.FileSystemFetcher -- Fetched: > success=[fetch_key: "/home/ndipiazza/Downloads/docx/596206.docx" > fields { > key: "Content-Type" > value: "application/x-tika-ooxml" > } > fields { > key: "X-TIKA:Parsed-By" > value: "org.apache.tika.parser.EmptyParser" > } > > I see the parsers in the intellij: > > AutoDetectParserConfig{spoolToDisk=null, outputThreshold=null, > maximumCompressionRatio=null, maximumDepth=null, > maximumPackageEntryDepth=null, metadataWriteFilterFactory=null, > embeddedDocumentExtractorFactory=org.apache.tika.extractor.ParsingEmbeddedDocumentExtractorFactory@1ddeed05, > contentHandlerDecoratorFactory=org.apache.tika.parser.AutoDetectParserConfig$1@52f4a530, > digesterFactory=null, throwOnZeroBytes=true} > > [image: image.png] > > Do I need to send metadata along with the request? >
