[jira] Updated: (TIKA-561) Support EMLX file detection

2010-11-25 Thread Antoni Mylka (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-561: -- Attachment: tika-561.patch a patch which contains the modifications and the test file, It overlaps with m

[jira] Created: (TIKA-561) Support EMLX file detection

2010-11-25 Thread Antoni Mylka (JIRA)
Support EMLX file detection --- Key: TIKA-561 URL: https://issues.apache.org/jira/browse/TIKA-561 Project: Tika Issue Type: Improvement Reporter: Antoni Mylka Apple Mail generates email files in .emlx fo

[jira] Updated: (TIKA-560) Improve detection of .mht, Foxmail, and OOXML files

2010-11-25 Thread Antoni Mylka (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-560: -- Attachment: test-documents.zip tika-560.patch A patch with my solution proposal, and the

[jira] Created: (TIKA-560) Improve detection of .mht, Foxmail, and OOXML files

2010-11-25 Thread Antoni Mylka (JIRA)
Improve detection of .mht, Foxmail, and OOXML files --- Key: TIKA-560 URL: https://issues.apache.org/jira/browse/TIKA-560 Project: Tika Issue Type: Improvement Reporter: Antoni Mylk

Re: Furthering Along TIKA-461

2010-11-25 Thread Julien Nioche
Hi Ben, Great! I still haven't found the time to work on Nick's suggestions but you can definitely work on the tests if you want to and add some of the emails you mentioned. Having some cases of multipart with HTML and txt content + images and attachments would be good. Thanks Julien On 25 Nove

Furthering Along TIKA-461

2010-11-25 Thread Benjamin Douglas
Hello, I am working on a project with rfc-822 email messages and ran into the problem discussed in TIKA-461. I'd be interested in helping this story along, if there is anything more to be done. In particular, I have a pile of public domain emails that might be useful for testing. Thanks, Ben D

[jira] Commented: (TIKA-559) [PDF Parser] New paragraph not taken into account sometime

2010-11-25 Thread Staffan Olsson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935788#action_12935788 ] Staffan Olsson commented on TIKA-559: - Isnt this a duplicate of TIKA-548? Try trunk. > [

[jira] Updated: (TIKA-559) [PDF Parser] New paragraph not taken into account sometime

2010-11-25 Thread Antoine L. (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine L. updated TIKA-559: Attachment: partition.pdf > [PDF Parser] New paragraph not taken into account sometime >

[jira] Created: (TIKA-559) [PDF Parser] New paragraph not taken into account sometime

2010-11-25 Thread Antoine L. (JIRA)
[PDF Parser] New paragraph not taken into account sometime -- Key: TIKA-559 URL: https://issues.apache.org/jira/browse/TIKA-559 Project: Tika Issue Type: Bug Components: parse

[jira] Commented: (TIKA-557) Extract text file PDF error

2010-11-25 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935773#action_12935773 ] Ken Krugler commented on TIKA-557: -- By default the WriteOutContentHandler has a limit of 100

[jira] Resolved: (TIKA-558) Problems/inconsistency with jar edu.ucar:netcdf:4.2 used by Tika 0.8

2010-11-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-558. - Resolution: Duplicate Duplicate of TIKA-556 > Problems/inconsistency with jar edu.ucar:netcdf:4.2 used by

[jira] Updated: (TIKA-558) Problems/inconsistency with jar edu.ucar:netcdf:4.2 used by Tika 0.8

2010-11-25 Thread Guest (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guest updated TIKA-558: --- Description: I use Maven to build my application. Part of this application is Tika. I previously used Tika 0.4 with no

[jira] Created: (TIKA-558) Problems/inconsistency with jar edu.ucar:netcdf:4.2 used by Tika 0.8

2010-11-25 Thread Guest (JIRA)
Problems/inconsistency with jar edu.ucar:netcdf:4.2 used by Tika 0.8 Key: TIKA-558 URL: https://issues.apache.org/jira/browse/TIKA-558 Project: Tika Issue Type: Bug

[jira] Resolved: (TIKA-557) Extract text file PDF error

2010-11-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-557. - Resolution: Invalid You've set a Write Limit on your ContentHandler, and the text in your PDF is too big