[jira] [Comment Edited] (TIKA-2559) Expose language metadata from PDF documents

2018-01-30 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346160#comment-16346160 ] Matt Sheppard edited comment on TIKA-2559 at 1/31/18 2:36 AM: --

[jira] [Updated] (TIKA-2559) Expose language metadata from PDF documents

2018-01-30 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-2559: Attachment: acrobat-xi-pdf-accessibility-overview.pdf > Expose language metadata from PDF documents >

[jira] [Commented] (TIKA-2559) Expose language metadata from PDF documents

2018-01-30 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346160#comment-16346160 ] Matt Sheppard commented on TIKA-2559: - Ah, [http://itaccessibility.arizona.edu/sites/i

[jira] [Created] (TIKA-2559) Expose language metadata from PDF documents

2018-01-30 Thread Matt Sheppard (JIRA)
Matt Sheppard created TIKA-2559: --- Summary: Expose language metadata from PDF documents Key: TIKA-2559 URL: https://issues.apache.org/jira/browse/TIKA-2559 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1730) Excel to HTML filtering seems to produce some font setting gibberish in output

2015-09-06 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733239#comment-14733239 ] Matt Sheppard commented on TIKA-1730: - File in question can be downloaded from https:/

[jira] [Updated] (TIKA-1730) Excel to HTML filtering seems to produce some font setting gibberish in output

2015-09-06 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-1730: Description: Noticed while upgrading form Tika 1.8 to 1.10 - An .xls file linked below, which used t

[jira] [Updated] (TIKA-1730) Excel to HTML filtering seems to produce some font setting gibberish in output

2015-09-06 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-1730: Description: Noticed while upgrading form Tika 1.8 to 1.10 - An .xls file linked below, which used t

[jira] [Created] (TIKA-1730) Excel to HTML filtering seems to produce some font setting gibberish in output

2015-09-06 Thread Matt Sheppard (JIRA)
Matt Sheppard created TIKA-1730: --- Summary: Excel to HTML filtering seems to produce some font setting gibberish in output Key: TIKA-1730 URL: https://issues.apache.org/jira/browse/TIKA-1730 Project: Tik

[jira] [Commented] (TIKA-1590) A particular PDF seems to trigger an infinite loop when being converted to HTML

2015-04-01 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392196#comment-14392196 ] Matt Sheppard commented on TIKA-1590: - Great, thanks - Looking forward to 1.8! > A par

[jira] [Updated] (TIKA-1590) A particular PDF seems to trigger an infinite loop when being converted to HTML

2015-03-31 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-1590: Attachment: National_Audit_tool_CTH_Audit_Report_PDF,_292_KB.pdf jstack.txt Attached

[jira] [Created] (TIKA-1590) A particular PDF seems to trigger an infinite loop when being converted to HTML

2015-03-31 Thread Matt Sheppard (JIRA)
Matt Sheppard created TIKA-1590: --- Summary: A particular PDF seems to trigger an infinite loop when being converted to HTML Key: TIKA-1590 URL: https://issues.apache.org/jira/browse/TIKA-1590 Project: Ti

[jira] [Commented] (TIKA-1174) Invalid characters in filtered PDF output

2013-09-20 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772821#comment-13772821 ] Matt Sheppard commented on TIKA-1174: - On further investigation the characters fall in

[jira] [Updated] (TIKA-1174) Invalid characters in filtered PDF output

2013-09-19 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-1174: Attachment: map_sp_1c_a4.pdf Attached a copy of the PDF in question in case the site is changed.

[jira] [Created] (TIKA-1174) Invalid characters in filtered PDF output

2013-09-19 Thread Matt Sheppard (JIRA)
Matt Sheppard created TIKA-1174: --- Summary: Invalid characters in filtered PDF output Key: TIKA-1174 URL: https://issues.apache.org/jira/browse/TIKA-1174 Project: Tika Issue Type: Bug E

[jira] [Updated] (TIKA-911) Converted PDF document contains question marks in place of spaces and inconsistent case

2012-05-02 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-911: --- Attachment: Rust Biosecurity Brochure.pdf.html > Converted PDF document contains question marks in

[jira] [Commented] (TIKA-911) Converted PDF document contains question marks in place of spaces and inconsistent case

2012-05-02 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266502#comment-13266502 ] Matt Sheppard commented on TIKA-911: Confirmed that it still occurs for me on a differen

[jira] [Commented] (TIKA-911) Converted PDF document contains question marks in place of spaces and inconsistent case

2012-05-02 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266486#comment-13266486 ] Matt Sheppard commented on TIKA-911: Interesting - I was running Mac OS 10.7.3. Will con

[jira] [Updated] (TIKA-911) Converted PDF document contains question marks in place of spaces and inconsistent case

2012-05-01 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-911: --- Attachment: Rust Biosecurity Brochure.pdf Attached PDF document in case is removed from the source site

[jira] [Created] (TIKA-911) Converted PDF document contains question marks in place of spaces and inconsistent case

2012-05-01 Thread Matt Sheppard (JIRA)
Matt Sheppard created TIKA-911: -- Summary: Converted PDF document contains question marks in place of spaces and inconsistent case Key: TIKA-911 URL: https://issues.apache.org/jira/browse/TIKA-911 Project

[jira] [Issue Comment Edited] (TIKA-621) RTF parsing fails with Java 7 early access on 64bit platforms

2011-03-24 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011000#comment-13011000 ] Matt Sheppard edited comment on TIKA-621 at 3/24/11 11:26 PM: --

[jira] [Commented] (TIKA-621) RTF parsing fails with Java 7 early access on 64bit platforms

2011-03-24 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011000#comment-13011000 ] Matt Sheppard commented on TIKA-621: I have reported this issue via http://bugreport.su

[jira] [Updated] (TIKA-621) RTF parsing fails with Java 7 early access on 64bit platforms

2011-03-24 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-621: --- Description: I've run across an RTF documents which tika is failing to convert on 64bit platforms (Win

[jira] [Updated] (TIKA-621) RTF parsing fails with Java 7 early access on 64bit platforms

2011-03-24 Thread Matt Sheppard (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Sheppard updated TIKA-621: --- Description: I've run across an RTF documents which tika is failing to convert on 64bit platforms (Win

[jira] [Created] (TIKA-621) RTF parsing fails with Java 7 early access on 64bit platforms

2011-03-24 Thread Matt Sheppard (JIRA)
RTF parsing fails with Java 7 early access on 64bit platforms - Key: TIKA-621 URL: https://issues.apache.org/jira/browse/TIKA-621 Project: Tika Issue Type: Bug Components: