[ 
https://issues.apache.org/jira/browse/TIKA-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter May updated TIKA-847:
---------------------------

    Attachment: regex_support.patch

Patch updating MagicDetector and associated unit tests to incorporate regular 
expression support in the signature file (does not support EOF regular 
expressions).

This required a slight extension to the freedesktops mime-info to support a 
type="regex" attribute in the "match" element.  Do you have an XML schema 
anywhere for mime-info, as this would also need updating?

I also noted (what I consider) a minor bug in the while loop at line 315 
(https://github.com/apache/tika/blob/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java#L315)
 of MagicDetector, where the offset is not incremented by the number of read 
bytes.  I have corrected that in this patch, but I can extract this out as a 
separate issue if preferred?
                
> Add regular expression support to the MagicDetector
> ---------------------------------------------------
>
>                 Key: TIKA-847
>                 URL: https://issues.apache.org/jira/browse/TIKA-847
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Andrew Jackson
>              Labels: detection, format
>         Attachments: regex_support.patch
>
>
> Following on from TIKA-86, we would like to add support for regular 
> expressions to the MagicDetector. This would allow more signatures to be 
> re-used from more sources (e.g. the file(1) command). As part of the SCAPE 
> Project, we have added this functionality to our own Tika branch (e.g. 
> https://github.com/openplanets/tika/commit/b8de9e77c8b432788e3f73a4dbccca8ea044b503)
>  and are working to tidy this up to make a clean patch we can submit here.
> BTW, are there any patch submission guidelines or coding standards we should 
> check our work against first?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to