Jpeg parsing issues

2010-09-06 Thread Ken Krugler
Hi devs, I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a number of documents now fail during parsing that previously passed. Many of these failures seem related to image processing. For example: Caused by: org.apache.tika.exception.TikaException: Can't read JPEG metadat

[jira] Commented: (TIKA-484) xlsx files created with open office are detected as application/zip

2010-09-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906555#action_12906555 ] Nick Burch commented on TIKA-484: - I've just tried this file with Tika-App (which passes the

[jira] Resolved: (TIKA-451) Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED

2010-09-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-451. - Fix Version/s: 0.8 Resolution: Fixed I think all the key metadata keys are now defined as Date Prope

[jira] Resolved: (TIKA-485) ContainerAwareDetector doesn't support truncated POI files

2010-09-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-485. - Assignee: Nick Burch Fix Version/s: 0.8 Resolution: Fixed Thanks for the patch, applied wit

[jira] Resolved: (TIKA-486) ContainerAwareDetector doesn't support non-MSOffice files which use the same magic

2010-09-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-486. - Assignee: Nick Burch Fix Version/s: 0.8 Resolution: Fixed Thanks for the sample files. I've

[jira] Commented: (TIKA-486) ContainerAwareDetector doesn't support non-MSOffice files which use the same magic

2010-09-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906524#action_12906524 ] Nick Burch commented on TIKA-486: - Thinking about it some more, these non Microsoft files whi

[jira] Updated: (TIKA-461) RFC822 messages not parsed

2010-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-461: --- Issue Type: New Feature (was: Bug) changed from bug to new feature > RFC822 messages not parsed > ---

[jira] Updated: (TIKA-461) RFC822 messages not parsed

2010-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated TIKA-461: --- Attachment: TIKA-461.patch This patch contains an initial version of the RFC822Parser which uses apach

[jira] Commented: (TIKA-482) Refactor image and jpeg parsers for access to MetadataExtractor API

2010-09-06 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906483#action_12906483 ] Nick Burch commented on TIKA-482: - I couldn't include ImageMetadataExtractorTest as it uses n

Re: Container Extractor?

2010-09-06 Thread Nick Burch
On Wed, 1 Sep 2010, Nick Burch wrote: I've been thinking about extracting files from container formats (eg images in a .docx, pdfs in a zip file etc). I've been pondering the various feedback over the weekend, and hopefully now have a more detailed idea. Firstly, the new service needs to wor

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906468#action_12906468 ] Julien Nioche commented on TIKA-461: I'll have a look at mime4j and try to use it in Tika

[jira] Assigned: (TIKA-461) RFC822 messages not parsed

2010-09-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned TIKA-461: -- Assignee: Julien Nioche > RFC822 messages not parsed > -- > >

[jira] Created: (TIKA-505) set sortByPosition option by default

2010-09-06 Thread Sandor Dj (JIRA)
set sortByPosition option by default Key: TIKA-505 URL: https://issues.apache.org/jira/browse/TIKA-505 Project: Tika Issue Type: Improvement Components: parser Environment: Win 7, C#, T