[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-07-15 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-683: Attachment: testRTFJapanese.rtf Add test file. Based on Jp_euc-jp_rtf1.rtf from http://mail-archives.apache.

[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-08-06 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristian Vat updated TIKA-683: -- Attachment: testUnicodeUCNControlWordCharacterDoubling.rtf Test file for \ucN control word character doub

[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-08-07 Thread Cristian Vat (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristian Vat updated TIKA-683: -- Attachment: TIKA-683.patch Patch with reduced test file and new test for character doubling in RTFParser

[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-08-17 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-683: Attachment: TIKA-683-unicode-testcase.patch I was curious/nervous whether the RTFParser (and

[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-08-22 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-683: Attachment: testWORD_bold_character_runs2.docx testWORD_bold_character_runs.do

[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-09-01 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-683: Attachment: TIKA-683.patch Attached patch, with a first cut at using a simple (shallow) token

[jira] [Updated] (TIKA-683) RTF Parser issues with non european characters

2011-09-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-683: Attachment: TIKA-683.patch New patch; I think it's ready! Changes from last patch: - Fact