[
https://issues.apache.org/jira/browse/TIKA-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting updated TIKA-76:
------------------------------
Attachment: TIKA-76.patch
Instead of making real copies of the documents, we could always just feed an
incorrect file name with the original resource stream.
See the attached patch for an example of how this could work with
AutoDetectParserTest. The patch uses the AutoDetectParser on all the current
test documents in the following configurations:
* correct name and type hints
* correct name but no type hint
* correct name but incorrect type hint
* incorrect type and no name hint
* correct type but no name hint
* correct type but incorrect name hint
* incorrect name and no type hint
* incorrect name and type hints
* no name or type hints
It seems we currently need MIME magic tests for Excel, PowerPoint, RTF, plain
text, word, and XML.
> Need to add test documents with wrong extensions.
> -------------------------------------------------
>
> Key: TIKA-76
> URL: https://issues.apache.org/jira/browse/TIKA-76
> Project: Tika
> Issue Type: Improvement
> Components: general
> Affects Versions: 0.1-incubator
> Reporter: Keith R. Bennett
> Fix For: 0.1-incubator
>
> Attachments: TIKA-76.patch
>
>
> We need to add test documents with misleading extensions to verify that the
> file header MIME type determination is taking precedence over the file name
> approach.
> I suggest copying existing files such as:
> cp testHTML.html testReallyHTML.doc
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.