Hi devs,
I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a
number of documents now fail during parsing that previously passed.
Many of these failures seem related to image processing. For example:
Caused by: org.apache.tika.exception.TikaException: Can't read JPEG
metadat
[
https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906555#action_12906555
]
Nick Burch commented on TIKA-484:
-
I've just tried this file with Tika-App (which passes the
[
https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-451.
-
Fix Version/s: 0.8
Resolution: Fixed
I think all the key metadata keys are now defined as Date Prope
[
https://issues.apache.org/jira/browse/TIKA-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-485.
-
Assignee: Nick Burch
Fix Version/s: 0.8
Resolution: Fixed
Thanks for the patch, applied wit
[
https://issues.apache.org/jira/browse/TIKA-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-486.
-
Assignee: Nick Burch
Fix Version/s: 0.8
Resolution: Fixed
Thanks for the sample files. I've
[
https://issues.apache.org/jira/browse/TIKA-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906524#action_12906524
]
Nick Burch commented on TIKA-486:
-
Thinking about it some more, these non Microsoft files whi
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-461:
---
Issue Type: New Feature (was: Bug)
changed from bug to new feature
> RFC822 messages not parsed
> ---
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated TIKA-461:
---
Attachment: TIKA-461.patch
This patch contains an initial version of the RFC822Parser which uses
apach
[
https://issues.apache.org/jira/browse/TIKA-482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906483#action_12906483
]
Nick Burch commented on TIKA-482:
-
I couldn't include ImageMetadataExtractorTest as it uses n
On Wed, 1 Sep 2010, Nick Burch wrote:
I've been thinking about extracting files from container formats (eg
images in a .docx, pdfs in a zip file etc).
I've been pondering the various feedback over the weekend, and hopefully
now have a more detailed idea.
Firstly, the new service needs to wor
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906468#action_12906468
]
Julien Nioche commented on TIKA-461:
I'll have a look at mime4j and try to use it in Tika
[
https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned TIKA-461:
--
Assignee: Julien Nioche
> RFC822 messages not parsed
> --
>
>
set sortByPosition option by default
Key: TIKA-505
URL: https://issues.apache.org/jira/browse/TIKA-505
Project: Tika
Issue Type: Improvement
Components: parser
Environment: Win 7, C#, T
13 matches
Mail list logo