[ https://issues.apache.org/jira/browse/TIKA-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter May updated TIKA-847: --------------------------- Attachment: regex_support.patch Patch updating MagicDetector and associated unit tests to incorporate regular expression support in the signature file (does not support EOF regular expressions). This required a slight extension to the freedesktops mime-info to support a type="regex" attribute in the "match" element. Do you have an XML schema anywhere for mime-info, as this would also need updating? I also noted (what I consider) a minor bug in the while loop at line 315 (https://github.com/apache/tika/blob/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java#L315) of MagicDetector, where the offset is not incremented by the number of read bytes. I have corrected that in this patch, but I can extract this out as a separate issue if preferred? > Add regular expression support to the MagicDetector > --------------------------------------------------- > > Key: TIKA-847 > URL: https://issues.apache.org/jira/browse/TIKA-847 > Project: Tika > Issue Type: New Feature > Components: mime > Affects Versions: 1.0 > Reporter: Andrew Jackson > Labels: detection, format > Attachments: regex_support.patch > > > Following on from TIKA-86, we would like to add support for regular > expressions to the MagicDetector. This would allow more signatures to be > re-used from more sources (e.g. the file(1) command). As part of the SCAPE > Project, we have added this functionality to our own Tika branch (e.g. > https://github.com/openplanets/tika/commit/b8de9e77c8b432788e3f73a4dbccca8ea044b503) > and are working to tidy this up to make a clean patch we can submit here. > BTW, are there any patch submission guidelines or coding standards we should > check our work against first? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira