Hi There might be a bug with the AutoDetectParser, which fails to recognise some plain-text files as plain text.
In the attachment are three testing files, as you can see they are all plain text. The following code is used for my testing: ———————— AutoDetectParser parser = new AutoDetectParser(); for (File f : new File("/Users/-/work/jate/experiment/bugged_corpus").listFiles()) { InputStream in = new BufferedInputStream(new FileInputStream(f.toString())); BodyContentHandler handler = new BodyContentHandler(-1); Metadata metadata = new Metadata(); try { parser.parse(in, handler, metadata); String content = handler.toString(); System.out.println(metadata); //line A }catch (Exception e){ e.printStackTrace(); } } ———————— for the three testing files, I would expect line A to print “plain text”, in fact, it is printing the following: X-Parsed-By=org.apache.tika.parser.EmptyParser Content-Type=image/x-portable-bitmap X-Parsed-By=org.apache.tika.parser.DefaultParser X-Parsed-By=org.apache.tika.parser.mp3.Mp3Parser xmpDM:audioCompressor=MP3 Content-Type=audio/mpeg X-Parsed-By=org.apache.tika.parser.EmptyParser Content-Type=image/x-portable-bitmap And as a result, variable “content” is always empty. Any suggestions on this please? Thanks