[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-683:
Attachment: testRTFJapanese.rtf
Add test file. Based on Jp_euc-jp_rtf1.rtf from
http://mail-archives.apache.
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-683:
--
Attachment: testUnicodeUCNControlWordCharacterDoubling.rtf
Test file for \ucN control word character doub
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-683:
--
Attachment: TIKA-683.patch
Patch with reduced test file and new test for character doubling in
RTFParser
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-683:
Attachment: TIKA-683-unicode-testcase.patch
I was curious/nervous whether the RTFParser (and
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-683:
Attachment: testWORD_bold_character_runs2.docx
testWORD_bold_character_runs.do
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-683:
Attachment: TIKA-683.patch
Attached patch, with a first cut at using a simple (shallow) token
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-683:
Attachment: TIKA-683.patch
New patch; I think it's ready! Changes from last patch:
- Fact