[jira] Updated: (TIKA-76) Need to add test documents with wrong extensions.

Jukka Zitting (JIRA) Wed, 17 Oct 2007 15:05:12 -0700

     [ 
https://issues.apache.org/jira/browse/TIKA-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jukka Zitting updated TIKA-76:
------------------------------

    Attachment: TIKA-76.patch

Instead of making real copies of the documents, we could always just feed an 
incorrect file name with the original resource stream.

See the attached patch for an example of how this could work with 
AutoDetectParserTest. The patch uses the AutoDetectParser on all the current 
test documents in the following configurations:

    * correct name and type hints
    * correct name but no type hint
    * correct name but incorrect type hint
    * incorrect type and no name hint
    * correct type but no name hint
    * correct type but incorrect name hint
    * incorrect name and no type hint
    * incorrect name and type hints
    * no name or type hints

It seems we currently need MIME magic tests for Excel, PowerPoint, RTF, plain 
text, word, and XML.

> Need to add test documents with wrong extensions.
> -------------------------------------------------
>
>                 Key: TIKA-76
>                 URL: https://issues.apache.org/jira/browse/TIKA-76
>             Project: Tika
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-76.patch
>
>
> We need to add test documents with misleading extensions to verify that the 
> file header MIME type determination is taking precedence over the file name 
> approach.
> I suggest copying existing files such as:
> cp testHTML.html testReallyHTML.doc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-76) Need to add test documents with wrong extensions.

Reply via email to