[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-422:
--
Attachment: RTFParser.patch
Attached updated version of the patch.
Changed isUnicode to include characte
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920365#action_12920365
]
Cristian Vat commented on TIKA-422:
---
Clarification:
The previous patch added some extra sp
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-422:
--
Attachment: RTFParser.patch
Attached new patch.
Added test for checking space after umlaut/encoded chara
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-422:
--
Attachment: RTFParser.patch
Attached new patch.
Previous patch broke TIKA-392 when previous cell ended i
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921110#action_12921110
]
Cristian Vat commented on TIKA-422:
---
Anyone mind looking over the patch so far?
It seems to
[
https://issues.apache.org/jira/browse/TIKA-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-422:
--
Attachment: RTFParser.patch
New patch.
- Fixed curly brackets exception.
- Added handling also for "\afN
[
https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995489#comment-12995489
]
Cristian Vat commented on TIKA-469:
---
Possible this is a problem with the PDF or installed/
[
https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995489#comment-12995489
]
Cristian Vat edited comment on TIKA-469 at 2/16/11 8:05 PM:
Poss
[
https://issues.apache.org/jira/browse/TIKA-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995516#comment-12995516
]
Cristian Vat commented on TIKA-469:
---
Also tested the Word file parsing and it looks ok.
Di
[
https://issues.apache.org/jira/browse/TIKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036337#comment-13036337
]
Cristian Vat commented on TIKA-642:
---
For the example file it seems like there's only extra
[
https://issues.apache.org/jira/browse/TIKA-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080449#comment-13080449
]
Cristian Vat commented on TIKA-632:
---
Tika uses RTFEditorKit from javax.swing.text.rtf for
[
https://issues.apache.org/jira/browse/TIKA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080459#comment-13080459
]
Cristian Vat commented on TIKA-642:
---
RTF file format starts with "{\rtf1" and the file end
[
https://issues.apache.org/jira/browse/TIKA-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080470#comment-13080470
]
Cristian Vat commented on TIKA-666:
---
I checked the error in more detail, mostly to check t
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-683:
--
Attachment: testUnicodeUCNControlWordCharacterDoubling.rtf
Test file for \ucN control word character doub
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080488#comment-13080488
]
Cristian Vat commented on TIKA-683:
---
I managed to take the original file and slim it down
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-683:
--
Attachment: TIKA-683.patch
Patch with reduced test file and new test for character doubling in
RTFParser
[
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087367#comment-13087367
]
Cristian Vat commented on TIKA-683:
---
Thanks Mike for looking into the issues. I also know
Cristian Vat created TIKA-2837:
--
Summary: Performance/Stability problem in ToHTMLContentHandler
Key: TIKA-2837
URL: https://issues.apache.org/jira/browse/TIKA-2837
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cristian Vat updated TIKA-2837:
---
Description:
I got a StackOverflowError while parsing a large PDF file using
ToHTMLContentHandler. Tr
Cristian Vat created TIKA-3008:
--
Summary: Word Doc/Docx Formatting Extraction -
Superscript/Subscript
Key: TIKA-3008
URL: https://issues.apache.org/jira/browse/TIKA-3008
Project: Tika
Issue Typ
[
https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993418#comment-16993418
]
Cristian Vat commented on TIKA-3008:
Work-in-progress branch at [https://github.com/de
[
https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993842#comment-16993842
]
Cristian Vat commented on TIKA-3008:
Added parser test and sample documents to my bran
[
https://issues.apache.org/jira/browse/TIKA-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043776#comment-17043776
]
Cristian Vat commented on TIKA-2837:
I guess this could be closed?
It serves as do
[
https://issues.apache.org/jira/browse/TIKA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135122#comment-17135122
]
Cristian Vat commented on TIKA-3008:
Opened PR with handling for basic use-cases and s
24 matches
Mail list logo