[jira] [Created] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-11 Thread David Pilato (JIRA)
David Pilato created TIKA-1548: -- Summary: System property added while catching exception on parsing PDF encrypted doc Key: TIKA-1548 URL: https://issues.apache.org/jira/browse/TIKA-1548 Project: Tika

[jira] [Commented] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322524#comment-14322524 ] David Pilato commented on TIKA-1548: Thanks for fixing this. To answer to your question

[jira] [Created] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
David Pilato created TIKA-1555: -- Summary: posix_spawn is not a supported process launch mechanism on this platform Key: TIKA-1555 URL: https://issues.apache.org/jira/browse/TIKA-1555 Project: Tika

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329300#comment-14329300 ] David Pilato commented on TIKA-1555: Thank you Uwe. I don't understand why I was not ab

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329446#comment-14329446 ] David Pilato commented on TIKA-1555: I read the code and it sounds like to me that is t

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329452#comment-14329452 ] David Pilato commented on TIKA-1555: Well I could try but for now I did not manage to r

[jira] [Commented] (TIKA-1557) Create TesseractOCR Option to Never Run

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329509#comment-14329509 ] David Pilato commented on TIKA-1557: Thanks! I'd not qualify it as a bug though. :) >

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-02-23 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333470#comment-14333470 ] David Pilato commented on TIKA-1526: I just ran a test on my machine: With Tika 1.7 an

[jira] [Created] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-07 Thread David Pilato (JIRA)
David Pilato created TIKA-2030: -- Summary: A space is suppressed when parsing Odt file Key: TIKA-2030 URL: https://issues.apache.org/jira/browse/TIKA-2030 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-07 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-2030: --- Attachment: test.docx test.odt ODT and DOCX files > A space is suppressed when parsing

[jira] [Created] (TIKA-1165) Autodetect and parse Asciidoc

2013-08-20 Thread David Pilato (JIRA)
David Pilato created TIKA-1165: -- Summary: Autodetect and parse Asciidoc Key: TIKA-1165 URL: https://issues.apache.org/jira/browse/TIKA-1165 Project: Tika Issue Type: Wish Components: l

[jira] [Commented] (TIKA-1165) Autodetect and parse Asciidoc

2014-03-25 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946836#comment-13946836 ] David Pilato commented on TIKA-1165: Sounds like I never answered to your comment! Sham

[jira] [Created] (TIKA-2208) Catch missing libraires

2016-12-14 Thread David Pilato (JIRA)
David Pilato created TIKA-2208: -- Summary: Catch missing libraires Key: TIKA-2208 URL: https://issues.apache.org/jira/browse/TIKA-2208 Project: Tika Issue Type: Improvement Components:

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-14 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748161#comment-15748161 ] David Pilato commented on TIKA-2208: Looks like a good idea. Let me try it and come bac

[jira] [Comment Edited] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753790#comment-15753790 ] David Pilato edited comment on TIKA-2208 at 12/16/16 8:16 AM: --

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753790#comment-15753790 ] David Pilato commented on TIKA-2208: So I tried this way. Basically I declared `` But

[jira] [Comment Edited] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753790#comment-15753790 ] David Pilato edited comment on TIKA-2208 at 12/16/16 8:17 AM: --

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754338#comment-15754338 ] David Pilato commented on TIKA-2208: That is correct. Thanks! > Catch missing libraire

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754454#comment-15754454 ] David Pilato commented on TIKA-2208: I got this document from the user who reported the

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754456#comment-15754456 ] David Pilato commented on TIKA-2208: I did not try with a pure Visio file though. > Ca

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754574#comment-15754574 ] David Pilato commented on TIKA-2208: The original reporter told me that we can reuse it

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754595#comment-15754595 ] David Pilato commented on TIKA-2208: I can confirm that your workaround works perfectly

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754624#comment-15754624 ] David Pilato commented on TIKA-2208: I agree with you on 2). That would give even more

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-18 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758867#comment-15758867 ] David Pilato commented on TIKA-2208: So we now have a regression in Elasticsearch tests

[jira] [Comment Edited] (TIKA-2208) Catch missing libraires

2016-12-18 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758867#comment-15758867 ] David Pilato edited comment on TIKA-2208 at 12/18/16 1:50 PM: --

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-18 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758897#comment-15758897 ] David Pilato commented on TIKA-2208: Adding missing libs {code} compile "com.github.

[jira] [Created] (TIKA-2227) Replacement of MSOffice#KEYWORDS for RTF and ODT docs

2016-12-22 Thread David Pilato (JIRA)
David Pilato created TIKA-2227: -- Summary: Replacement of MSOffice#KEYWORDS for RTF and ODT docs Key: TIKA-2227 URL: https://issues.apache.org/jira/browse/TIKA-2227 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2227) Replacement of MSOffice#KEYWORDS for RTF and ODT docs

2016-12-22 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771026#comment-15771026 ] David Pilato commented on TIKA-2227: Sorry. Answer is {{TikaCoreProperties.KEYWORDS}}.

[jira] [Closed] (TIKA-2227) Replacement of MSOffice#KEYWORDS for RTF and ODT docs

2016-12-22 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato closed TIKA-2227. -- Resolution: Not A Problem > Replacement of MSOffice#KEYWORDS for RTF and ODT docs > -

[jira] [Created] (TIKA-2579) Update to PDFBox 2.0.9

2018-02-21 Thread David Pilato (JIRA)
David Pilato created TIKA-2579: -- Summary: Update to PDFBox 2.0.9 Key: TIKA-2579 URL: https://issues.apache.org/jira/browse/TIKA-2579 Project: Tika Issue Type: Improvement Components: p

[jira] [Created] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-08 Thread David Pilato (Jira)
David Pilato created TIKA-3006: -- Summary: Regression in PDF keywords extraction since 1.23 Key: TIKA-3006 URL: https://issues.apache.org/jira/browse/TIKA-3006 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-08 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Description: Hey team.   I have a unit test which is not passing anymore with Tika 1.23. Code is [h

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-08 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Attachment: test.pdf > Regression in PDF keywords extraction since 1.23 >

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-09 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Description: Hey team.   I have a unit test which is not passing anymore with Tika 1.23. Code is [h

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-09 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Description: Hey team.   I have a unit test which is not passing anymore with Tika 1.23. Code is [h

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-09 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992068#comment-16992068 ] David Pilato commented on TIKA-3006: For whatever reason, the external link is replace

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-12 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035475#comment-17035475 ] David Pilato commented on TIKA-3006: Could you confirm that it is regression? As I can

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-24 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043830#comment-17043830 ] David Pilato commented on TIKA-3006: Is it possible to get a SNAPSHOT version of this?

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044349#comment-17044349 ] David Pilato commented on TIKA-3006: Well. It does not work as I'd love it seeing work

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044358#comment-17044358 ] David Pilato commented on TIKA-3006: All good for me. I noticed that new meta data ar

[jira] [Comment Edited] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044358#comment-17044358 ] David Pilato edited comment on TIKA-3006 at 2/25/20 11:41 AM: --

[jira] [Comment Edited] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044358#comment-17044358 ] David Pilato edited comment on TIKA-3006 at 2/25/20 11:42 AM: --

[jira] [Created] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-04 Thread David Pilato (Jira)
David Pilato created TIKA-3224: -- Summary: Stackoverflow with Embedded PDF in DOCX document Key: TIKA-3224 URL: https://issues.apache.org/jira/browse/TIKA-3224 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3224: --- Attachment: issue-stackoverflow.docx > Stackoverflow with Embedded PDF in DOCX document >

[jira] [Updated] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3224: --- Description: This issue has been reported by a user on [discuss.elastic.co|https://discuss.elastic.co

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259813#comment-17259813 ] David Pilato commented on TIKA-3258: I really like having {{auto}} as the default mode

[jira] [Created] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
David Pilato created TIKA-3364: -- Summary: PDF Content is extracted twice Key: TIKA-3364 URL: https://issues.apache.org/jira/browse/TIKA-3364 Project: Tika Issue Type: Bug Components: p

[jira] [Commented] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824 ] David Pilato commented on TIKA-3364: So I trie this: {code:java} PDFPars

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 2:38 PM: --

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 2:39 PM: --

[jira] [Commented] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato commented on TIKA-3364: Oh my god! I'm feeling stupid. Anyway, I was not

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:03 PM: --

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:03 PM: --

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:04 PM: --

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:05 PM: --

[jira] [Commented] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2021-07-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384284#comment-17384284 ] David Pilato commented on TIKA-3224: I just tested Tika 1.27 with PDFBox 2.0.24 and it

[jira] [Commented] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2021-07-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384332#comment-17384332 ] David Pilato commented on TIKA-3224: Oh I was confused. PDFBox 2.0.24 is in Tika 1.27.

[jira] [Resolved] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2021-07-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato resolved TIKA-3224. Fix Version/s: 1.27 Resolution: Fixed > Stackoverflow with Embedded PDF in DOCX document > --

[jira] [Created] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
David Pilato created TIKA-3493: -- Summary: dcterms:created date depends on the current TimeZone in RTF documents Key: TIKA-3493 URL: https://issues.apache.org/jira/browse/TIKA-3493 Project: Tika

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: Test_case_to_demo_the_change_with_Tika_1_x.patch > dcterms:created date depends on the cur

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: (was: Test_case_to_demo_the_change_with_Tika_1_x.patch) > dcterms:created date depends

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: Test_case_to_demo_the_change_with_Tika_1_x1.patch > dcterms:created date depends on the cu

[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385445#comment-17385445 ] David Pilato commented on TIKA-3493: I attached a patch which adds a unit test.  It i

[jira] [Comment Edited] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385445#comment-17385445 ] David Pilato edited comment on TIKA-3493 at 7/22/21, 11:59 AM: -

[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385505#comment-17385505 ] David Pilato commented on TIKA-3493: {quote}It doesn't look like the RTF specifies a t

[jira] [Created] (TIKA-3610) Emit errors to a specific emitter

2021-12-02 Thread David Pilato (Jira)
David Pilato created TIKA-3610: -- Summary: Emit errors to a specific emitter Key: TIKA-3610 URL: https://issues.apache.org/jira/browse/TIKA-3610 Project: Tika Issue Type: New Feature Co

[jira] [Commented] (TIKA-3610) Emit errors to a specific emitter

2021-12-15 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1746#comment-1746 ] David Pilato commented on TIKA-3610: That's very good. So I believe we are all set and

[jira] [Created] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
David Pilato created TIKA-3629: -- Summary: Keywords are not extracted anymore from PDF documents Key: TIKA-3629 URL: https://issues.apache.org/jira/browse/TIKA-3629 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462555#comment-17462555 ] David Pilato commented on TIKA-3629: Update: I can see the keywords available in: *

[jira] [Comment Edited] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462555#comment-17462555 ] David Pilato edited comment on TIKA-3629 at 12/20/21, 11:32 AM:

[jira] [Commented] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462833#comment-17462833 ] David Pilato commented on TIKA-3629: I'm not sure I got it.    Is {{Office.KEYWORDS}

[jira] [Commented] (TIKA-3659) SMB/NFS support

2022-01-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480437#comment-17480437 ] David Pilato commented on TIKA-3659: AFAIK it's not part of the FTP Client. I believe

[jira] [Commented] (TIKA-3863) Add a pipes reporter for OpenSearch

2022-09-28 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610516#comment-17610516 ] David Pilato commented on TIKA-3863: Amazing! That's a good addition.  > Add a pipes

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612617#comment-17612617 ] David Pilato commented on TIKA-3812: I'm still having issues with 2.5.0. Basically my

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612645#comment-17612645 ] David Pilato commented on TIKA-3812: [~tallison] I always had {{tika-parsers-scientifi

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612667#comment-17612667 ] David Pilato commented on TIKA-3812: I'm totally fine modifying the code on my side to

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-05 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612937#comment-17612937 ] David Pilato commented on TIKA-3812: When excluding {{GDALParser}} from the {{{}Defaul

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-05 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612949#comment-17612949 ] David Pilato commented on TIKA-3812: Amazing! That helps! I definitely want to read t

[jira] [Comment Edited] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627609#comment-17627609 ] David Pilato edited comment on TIKA-2536 at 11/2/22 10:42 AM: --

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627609#comment-17627609 ] David Pilato commented on TIKA-2536: Hey team netcdf 4.5.5 depends on cdm 4.5.5 which

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627646#comment-17627646 ] David Pilato commented on TIKA-2536: Ha right! Thanks for pointing this out [~tallison

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627656#comment-17627656 ] David Pilato commented on TIKA-2536: That's weird... I'm not seeing the same thing...

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627657#comment-17627657 ] David Pilato commented on TIKA-2536: But wait, it's shaded now??? So I should not have

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-07 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629794#comment-17629794 ] David Pilato commented on TIKA-2536: For future readers, the workaround to depend on T

[jira] [Commented] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2023-12-19 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798538#comment-17798538 ] David Pilato commented on TIKA-3629: I was looking at this one today. I guess we did n