Jeremy B. Merrill created TIKA-1771:
---------------------------------------

             Summary: lower magic priority xhtml magic priority to ensure 
emails detected as message/rfc822
                 Key: TIKA-1771
                 URL: https://issues.apache.org/jira/browse/TIKA-1771
             Project: Tika
          Issue Type: Improvement
          Components: detector
            Reporter: Jeremy B. Merrill
            Priority: Critical


Emails I have (happy to share if you want) contain XHTML, as one part of a 
multipart email. Prior to this pull request, the priority on the 
application/xhtml+xml magic detector was 50, equal to the priority on the 
message/rfc822 detector. Because of the relative position of the two detectors 
in tika-mimetypes.xml, the emails were incorrectly detected as XHTML documents.

With this PR, by downgrading the priority of application/xhtml+xml to 40, the 
more-sensitive email magic detectors take precedence, causing the emails to be 
properly detected as message/rfc822.

I have not run this thru the govdocs tester or anything other than my own 
documents, so, full disclosure, this could cause false negative 
xhtml-detections elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to