[jira] [Updated] (TIKA-820) Locator is unset for HTML parser

2011-12-20 Thread Daniel Bonniot de Ruisselet (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Bonniot de Ruisselet updated TIKA-820: - Attachment: text-locator.patch Fix+test patch. > Locator i

[jira] [Created] (TIKA-820) Locator is unset for HTML parser

2011-12-20 Thread Daniel Bonniot de Ruisselet (Created) (JIRA)
Locator is unset for HTML parser Key: TIKA-820 URL: https://issues.apache.org/jira/browse/TIKA-820 Project: Tika Issue Type: Bug Components: general, parser Reporter: Daniel Bonniot de R

[jira] [Commented] (TIKA-820) Locator is unset for HTML parser

2011-12-20 Thread Daniel Bonniot de Ruisselet (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173046#comment-13173046 ] Daniel Bonniot de Ruisselet commented on TIKA-820: -- Note that the exact val

[jira] [Commented] (TIKA-686) Split tika-parsers into separate components

2011-12-20 Thread Antoni Mylka (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173147#comment-13173147 ] Antoni Mylka commented on TIKA-686: --- Why keep this issue open? PdfParser appeared in PdfB

[jira] [Commented] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2011-12-20 Thread Albert L. (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173225#comment-13173225 ] Albert L. commented on TIKA-819: Oh, I see. Could this be a command-line option when using

[jira] [Created] (TIKA-821) Support detecting old MIcrosoft Works Word Processor formats

2011-12-20 Thread Antoni Mylka (Created) (JIRA)
Support detecting old MIcrosoft Works Word Processor formats Key: TIKA-821 URL: https://issues.apache.org/jira/browse/TIKA-821 Project: Tika Issue Type: Improvement Compo

[jira] [Commented] (TIKA-821) Support detecting old MIcrosoft Works Word Processor formats

2011-12-20 Thread Antoni Mylka (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173267#comment-13173267 ] Antoni Mylka commented on TIKA-821: --- Committed in r1221323 > Support dete

[jira] [Created] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread peter royal (Created) (JIRA)
MediaType fails to parse charset that has quoted value -- Key: TIKA-822 URL: https://issues.apache.org/jira/browse/TIKA-822 Project: Tika Issue Type: Bug Components: mime Affe

[jira] [Created] (TIKA-823) Detect StarOffice files

2011-12-20 Thread Antoni Mylka (Created) (JIRA)
Detect StarOffice files --- Key: TIKA-823 URL: https://issues.apache.org/jira/browse/TIKA-823 Project: Tika Issue Type: Improvement Affects Versions: 1.1 Reporter: Antoni Mylka I would like both MimeType

[jira] [Updated] (TIKA-823) Detect StarOffice files

2011-12-20 Thread Antoni Mylka (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoni Mylka updated TIKA-823: -- Attachment: testStarOffice-5.2-write.sdw testStarOffice-5.2-impress.sdd te

Re: svn commit: r1221323 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/ tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ tika-parsers/src/test/java/org/apache/tika/de

2011-12-20 Thread Nick Burch
On 20/12/11 15:55, amy...@apache.org wrote: + + + + + --- tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java (original) +++ tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java Tue Dec

[jira] [Commented] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

2011-12-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173715#comment-13173715 ] Nick Burch commented on TIKA-819: - Probably. Can you think up a suitable short and long form

[jira] [Commented] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173763#comment-13173763 ] Nick Burch commented on TIKA-822: - Should we handle single quotes too? I don't think they're

[jira] [Updated] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread peter royal (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] peter royal updated TIKA-822: - Comment: was deleted (was: the rfc for mime isn't clear on whether single quotes make a valid quoted strin

[jira] [Commented] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread peter royal (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173769#comment-13173769 ] peter royal commented on TIKA-822: -- the rfc for mime isn't clear on whether single quotes m

[jira] [Commented] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread peter royal (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173768#comment-13173768 ] peter royal commented on TIKA-822: -- the rfc for mime isn't clear on whether single quotes m

[jira] [Commented] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173796#comment-13173796 ] Nick Burch commented on TIKA-822: - OK, thanks for the info and the patch. I've added it, alo

[jira] [Resolved] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread Nick Burch (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-822. - Resolution: Fixed Fix Version/s: 1.1 > MediaType fails to parse charset that has quoted value >

[jira] [Commented] (TIKA-823) Detect StarOffice files

2011-12-20 Thread Nick Burch (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173820#comment-13173820 ] Nick Burch commented on TIKA-823: - Note that it looks like the strings are prefixed with a 4

[jira] [Commented] (TIKA-822) MediaType fails to parse charset that has quoted value

2011-12-20 Thread peter royal (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173833#comment-13173833 ] peter royal commented on TIKA-822: -- thanks! > MediaType fails to parse ch