Better detection of plain text versus binary formats with a text header -----------------------------------------------------------------------
Key: TIKA-154 URL: https://issues.apache.org/jira/browse/TIKA-154 Project: Tika Issue Type: Improvement Components: mime Reporter: Jukka Zitting Priority: Minor Antoni Mylka noted on the mailing list: Many binary formats begin with magic byte sequences composed of ASCII characters, e.g. zipfiles begin with PK pdfs begin with %PDF- chms help files begin with ITSF etc. Tika should do a better job of detecting such cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.