>>Question1) Shouldn't this be more specific? Like PdfParser, >>OpenDocumentParser and so on.
Y, make sure to call metadata.getValues(X-Parsed-By) which returns an array of values and then iterate through that array to see the parsers that actually processed your doc. If you call metadata.get(Property p), you only get the first value in the array. >> Question2) I understand that there is the DigestingParser to add Md5 and >> Sha1 hashes to the metadata. But how can I "combine" the AutoDetectParser >> and the DigestingParser? See DigestingParserTest [0] for exact code, but basically something like this: Metadata m = new Metadata(); CommonsDigester.DigestAlgorithm[] algos = CommonsDigester.parse("md5,sha512"); Parser d = new DigestingParser(new AutoDetectParser(), new CommonsDigester(1000000, algos, m) d.parse(InputStream....) [0] http://svn.apache.org/viewvc/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/DigestingParserTest.java?view=markup -----Original Message----- From: zahlenm...@gmx.de [mailto:zahlenm...@gmx.de] Sent: Tuesday, January 05, 2016 3:33 AM To: user@tika.apache.org Subject: Questions about using AutoDetect and DigestParser Happy New Year everyone, I have a small program for simple text and metadata extraction. It is really not more than this (in Scala): val fileParser : AutoDetectParser = new AutoDetectParser() val handler : WriteOutContentHandler = new WriteOutContentHandler(-1) val metadata : Metadata = new Metadata() val context : ParseContext = new ParseContext() try { fileParser.parse(stream, handler, metadata, context) } catch ... When I look at the metadata I always have this line: X-Parsed-By: org.apache.tika.parser.DefaultParser Question1) Shouldn't this be more specific? Like PdfParser, OpenDocumentParser and so on. Question2) I understand that there is the DigestingParser to add Md5 and Sha1 hashes to the metadata. But how can I "combine" the AutoDetectParser and the DigestingParser? Thanks so far Kind regards