[jira] [Commented] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2023-12-19 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798538#comment-17798538 ] David Pilato commented on TIKA-3629: I was looking at this one today. I guess we

API doc not available

2023-10-24 Thread David Pilato
Hey team Sounds like that https://tika.apache.org/2.9.1/api/ is not available. HTH David

Re: [DISCUSS] Release planning for 3.x and 2.x's EOL

2023-09-14 Thread David Pilato
I'm all for Java17 on 3.x. (A) David Pilato da...@pilato.fr 06 13 03 08 41 Le 13 sept. 2023 à 21:26 +0200, Tim Allison , a écrit : > We seem to have consensus on Java 11 for 3.x and keep Java 8 for 2.x for one > more year.  I've started the branches and started making some

Re: [DISCUSS] Release planning for 3.x and 2.x's EOL

2023-09-12 Thread David Pilato
The sooner, the better IMHO. +1 to drop Java8 support. David -- David Pilato, elastic.co Developer | Evangelist, Le 12 sept. 2023 à 16:50 +0200, Tim Allison , a écrit : > >If Tika users will be happy to move on and drop Java 8 and/or javax. Please > >drop them :))) > > Fello

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-07 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629794#comment-17629794 ] David Pilato commented on TIKA-2536: For future readers, the workaround to depen

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627657#comment-17627657 ] David Pilato commented on TIKA-2536: But wait, it's shaded now??? So I s

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627656#comment-17627656 ] David Pilato commented on TIKA-2536: That's weird... I'm not seeing

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627646#comment-17627646 ] David Pilato commented on TIKA-2536: Ha right! Thanks for pointing this

[jira] [Commented] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627609#comment-17627609 ] David Pilato commented on TIKA-2536: Hey team netcdf 4.5.5 depends on cdm 4

[jira] [Comment Edited] (TIKA-2536) Move to later edu.ucar version to avoid EOL dependencies

2022-11-02 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627609#comment-17627609 ] David Pilato edited comment on TIKA-2536 at 11/2/22 10:4

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-05 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612949#comment-17612949 ] David Pilato commented on TIKA-3812: Amazing! That helps! I definitely want to

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-05 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612937#comment-17612937 ] David Pilato commented on TIKA-3812: When excluding {{GDALParser}} from

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612667#comment-17612667 ] David Pilato commented on TIKA-3812: I'm totally fine modifying the code o

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612645#comment-17612645 ] David Pilato commented on TIKA-3812: [~tallison] I always had {{tika-par

[jira] [Commented] (TIKA-3812) Parser Order: image get parsed by GDALParser instead of TesseractOCRParser

2022-10-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612617#comment-17612617 ] David Pilato commented on TIKA-3812: I'm still having issues with 2.5.0.

[jira] [Commented] (TIKA-3863) Add a pipes reporter for OpenSearch

2022-09-28 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610516#comment-17610516 ] David Pilato commented on TIKA-3863: Amazing! That's a good addition.  >

[jira] [Commented] (TIKA-3659) SMB/NFS support

2022-01-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480437#comment-17480437 ] David Pilato commented on TIKA-3659: AFAIK it's not part of the FTP Client.

[jira] [Commented] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462833#comment-17462833 ] David Pilato commented on TIKA-3629: I'm not sure I got it.    Is {{Office

[jira] [Comment Edited] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462555#comment-17462555 ] David Pilato edited comment on TIKA-3629 at 12/20/21, 11:3

[jira] [Commented] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462555#comment-17462555 ] David Pilato commented on TIKA-3629: Update: I can see the keywords availabl

[jira] [Created] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2021-12-20 Thread David Pilato (Jira)
David Pilato created TIKA-3629: -- Summary: Keywords are not extracted anymore from PDF documents Key: TIKA-3629 URL: https://issues.apache.org/jira/browse/TIKA-3629 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-3610) Emit errors to a specific emitter

2021-12-15 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1746#comment-1746 ] David Pilato commented on TIKA-3610: That's very good. So I believe we ar

[jira] [Created] (TIKA-3610) Emit errors to a specific emitter

2021-12-02 Thread David Pilato (Jira)
David Pilato created TIKA-3610: -- Summary: Emit errors to a specific emitter Key: TIKA-3610 URL: https://issues.apache.org/jira/browse/TIKA-3610 Project: Tika Issue Type: New Feature

TesseractOCRConfig.setTesseractPath moved to TesseractOCRParser

2021-07-22 Thread David Pilato
parser/ocr/TesseractOCRParser.java#L76-L87 David -- David Pilato, elastic.co Developer | Evangelist,

[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385505#comment-17385505 ] David Pilato commented on TIKA-3493: {quote}It doesn't look like the RTF sp

[jira] [Comment Edited] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385445#comment-17385445 ] David Pilato edited comment on TIKA-3493 at 7/22/21, 11:5

[jira] [Commented] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385445#comment-17385445 ] David Pilato commented on TIKA-3493: I attached a patch which adds a unit test. 

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: Test_case_to_demo_the_change_with_Tika_1_x1.patch > dcterms:created date depends on

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: (was: Test_case_to_demo_the_change_with_Tika_1_x.patch) > dcterms:created d

[jira] [Updated] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3493: --- Attachment: Test_case_to_demo_the_change_with_Tika_1_x.patch > dcterms:created date depends on

[jira] [Created] (TIKA-3493) dcterms:created date depends on the current TimeZone in RTF documents

2021-07-22 Thread David Pilato (Jira)
David Pilato created TIKA-3493: -- Summary: dcterms:created date depends on the current TimeZone in RTF documents Key: TIKA-3493 URL: https://issues.apache.org/jira/browse/TIKA-3493 Project: Tika

Access to Tika Wiki

2021-07-22 Thread David Pilato
Hey As I'm moving my project to Tika 2.0.0, I would like to edit the Migrating to Tika 2.0.0 page (https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0). My username on the wiki site is dadoonet. Could you give me a write access? David -- David Pilato, elast

[jira] [Resolved] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2021-07-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato resolved TIKA-3224. Fix Version/s: 1.27 Resolution: Fixed > Stackoverflow with Embedded PDF in DOCX docum

[jira] [Commented] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2021-07-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384332#comment-17384332 ] David Pilato commented on TIKA-3224: Oh I was confused. PDFBox 2.0.24 is in Tika

[jira] [Commented] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2021-07-20 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384284#comment-17384284 ] David Pilato commented on TIKA-3224: I just tested Tika 1.27 with PDFBox 2.0.24

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:0

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:0

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:0

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 4:0

[jira] [Commented] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330882#comment-17330882 ] David Pilato commented on TIKA-3364: Oh my god! I'm feeling stupid. Anyw

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 2:3

[jira] [Commented] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824 ] David Pilato commented on TIKA-3364: So I trie this: {code:

[jira] [Comment Edited] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330824#comment-17330824 ] David Pilato edited comment on TIKA-3364 at 4/23/21, 2:3

[jira] [Created] (TIKA-3364) PDF Content is extracted twice

2021-04-23 Thread David Pilato (Jira)
David Pilato created TIKA-3364: -- Summary: PDF Content is extracted twice Key: TIKA-3364 URL: https://issues.apache.org/jira/browse/TIKA-3364 Project: Tika Issue Type: Bug Components

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259813#comment-17259813 ] David Pilato commented on TIKA-3258: I really like having {{auto}} as the def

[jira] [Updated] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3224: --- Description: This issue has been reported by a user on [discuss.elastic.co|https

[jira] [Updated] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-04 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3224: --- Attachment: issue-stackoverflow.docx > Stackoverflow with Embedded PDF in DOCX docum

[jira] [Created] (TIKA-3224) Stackoverflow with Embedded PDF in DOCX document

2020-11-04 Thread David Pilato (Jira)
David Pilato created TIKA-3224: -- Summary: Stackoverflow with Embedded PDF in DOCX document Key: TIKA-3224 URL: https://issues.apache.org/jira/browse/TIKA-3224 Project: Tika Issue Type: Bug

Re: [VOTE] Release Apache Tika 1.24.1 Candidate #1

2020-04-20 Thread David Pilato
> [X] +1 Release this package as Apache Tika 1.24.1 Tested. All good for me. Le 17 avr. 2020 à 23:38 +0200, Tim Allison , a écrit : > > A candidate for the Tika 1.24.1 release is available at: >   https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sourc

Re: [VOTE] Release Apache Tika 1.24 Candidate #3

2020-03-12 Thread David Pilato
Here is my +1. Le 11 mars 2020 à 20:02 +0100, Tim Allison , a écrit : > > A candidate for the Tika 1.24 release is available at: >   https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: >   https://github.com/apache/tika/tree/1.24-rc3/ > > The

[jira] [Comment Edited] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044358#comment-17044358 ] David Pilato edited comment on TIKA-3006 at 2/25/20 11:4

[jira] [Comment Edited] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044358#comment-17044358 ] David Pilato edited comment on TIKA-3006 at 2/25/20 11:4

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044358#comment-17044358 ] David Pilato commented on TIKA-3006: All good for me. I noticed that new meta

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-25 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044349#comment-17044349 ] David Pilato commented on TIKA-3006: Well. It does not work as I'd love

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-24 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043830#comment-17043830 ] David Pilato commented on TIKA-3006: Is it possible to get a SNAPSHOT version of

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-12 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035475#comment-17035475 ] David Pilato commented on TIKA-3006: Could you confirm that it is regression?

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-09 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992068#comment-16992068 ] David Pilato commented on TIKA-3006: For whatever reason, the external lin

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-09 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Description: Hey team.   I have a unit test which is not passing anymore with Tika 1.23. Code is

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-09 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Description: Hey team.   I have a unit test which is not passing anymore with Tika 1.23. Code is

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-08 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Attachment: test.pdf > Regression in PDF keywords extraction since 1

[jira] [Updated] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-08 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-3006: --- Description: Hey team.   I have a unit test which is not passing anymore with Tika 1.23. Code is

[jira] [Created] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2019-12-08 Thread David Pilato (Jira)
David Pilato created TIKA-3006: -- Summary: Regression in PDF keywords extraction since 1.23 Key: TIKA-3006 URL: https://issues.apache.org/jira/browse/TIKA-3006 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-2579) Update to PDFBox 2.0.9

2018-02-21 Thread David Pilato (JIRA)
David Pilato created TIKA-2579: -- Summary: Update to PDFBox 2.0.9 Key: TIKA-2579 URL: https://issues.apache.org/jira/browse/TIKA-2579 Project: Tika Issue Type: Improvement Components

[jira] [Closed] (TIKA-2227) Replacement of MSOffice#KEYWORDS for RTF and ODT docs

2016-12-22 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato closed TIKA-2227. -- Resolution: Not A Problem > Replacement of MSOffice#KEYWORDS for RTF and ODT d

[jira] [Commented] (TIKA-2227) Replacement of MSOffice#KEYWORDS for RTF and ODT docs

2016-12-22 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771026#comment-15771026 ] David Pilato commented on TIKA-2227: Sorry. Answer is {{TikaCoreProperties.KEYW

[jira] [Created] (TIKA-2227) Replacement of MSOffice#KEYWORDS for RTF and ODT docs

2016-12-22 Thread David Pilato (JIRA)
David Pilato created TIKA-2227: -- Summary: Replacement of MSOffice#KEYWORDS for RTF and ODT docs Key: TIKA-2227 URL: https://issues.apache.org/jira/browse/TIKA-2227 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-18 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758897#comment-15758897 ] David Pilato commented on TIKA-2208: Adding missing libs {code} com

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-18 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758867#comment-15758867 ] David Pilato commented on TIKA-2208: So we now have a regression in Elasticse

[jira] [Comment Edited] (TIKA-2208) Catch missing libraires

2016-12-18 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758867#comment-15758867 ] David Pilato edited comment on TIKA-2208 at 12/18/16 1:5

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754624#comment-15754624 ] David Pilato commented on TIKA-2208: I agree with you on 2). That would give even

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754595#comment-15754595 ] David Pilato commented on TIKA-2208: I can confirm that your workaround w

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754574#comment-15754574 ] David Pilato commented on TIKA-2208: The original reporter told me that we can r

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754456#comment-15754456 ] David Pilato commented on TIKA-2208: I did not try with a pure Visio file th

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754454#comment-15754454 ] David Pilato commented on TIKA-2208: I got this document from the user who repo

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754338#comment-15754338 ] David Pilato commented on TIKA-2208: That is correct. Thanks! > Catch

[jira] [Comment Edited] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753790#comment-15753790 ] David Pilato edited comment on TIKA-2208 at 12/16/16 8:1

[jira] [Comment Edited] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753790#comment-15753790 ] David Pilato edited comment on TIKA-2208 at 12/16/16 8:1

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753790#comment-15753790 ] David Pilato commented on TIKA-2208: So I tried this way. Basically I decl

[jira] [Commented] (TIKA-2208) Catch missing libraires

2016-12-14 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748161#comment-15748161 ] David Pilato commented on TIKA-2208: Looks like a good idea. Let me try it and

[jira] [Created] (TIKA-2208) Catch missing libraires

2016-12-14 Thread David Pilato (JIRA)
David Pilato created TIKA-2208: -- Summary: Catch missing libraires Key: TIKA-2208 URL: https://issues.apache.org/jira/browse/TIKA-2208 Project: Tika Issue Type: Improvement Components

[jira] [Updated] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-07 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Pilato updated TIKA-2030: --- Attachment: test.docx test.odt ODT and DOCX files > A space is suppressed w

[jira] [Created] (TIKA-2030) A space is suppressed when parsing Odt file

2016-07-07 Thread David Pilato (JIRA)
David Pilato created TIKA-2030: -- Summary: A space is suppressed when parsing Odt file Key: TIKA-2030 URL: https://issues.apache.org/jira/browse/TIKA-2030 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 & JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-02-23 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333470#comment-14333470 ] David Pilato commented on TIKA-1526: I just ran a test on my machine: With Tika

[jira] [Commented] (TIKA-1557) Create TesseractOCR Option to Never Run

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329509#comment-14329509 ] David Pilato commented on TIKA-1557: Thanks! I'd not qualify it as a b

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329452#comment-14329452 ] David Pilato commented on TIKA-1555: Well I could try but for now I did not manag

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329446#comment-14329446 ] David Pilato commented on TIKA-1555: I read the code and it sounds like to me tha

[jira] [Commented] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329300#comment-14329300 ] David Pilato commented on TIKA-1555: Thank you Uwe. I don't understand why

[jira] [Created] (TIKA-1555) posix_spawn is not a supported process launch mechanism on this platform

2015-02-20 Thread David Pilato (JIRA)
David Pilato created TIKA-1555: -- Summary: posix_spawn is not a supported process launch mechanism on this platform Key: TIKA-1555 URL: https://issues.apache.org/jira/browse/TIKA-1555 Project: Tika

[jira] [Commented] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-16 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322524#comment-14322524 ] David Pilato commented on TIKA-1548: Thanks for fixing this. To answer to

[jira] [Created] (TIKA-1548) System property added while catching exception on parsing PDF encrypted doc

2015-02-11 Thread David Pilato (JIRA)
David Pilato created TIKA-1548: -- Summary: System property added while catching exception on parsing PDF encrypted doc Key: TIKA-1548 URL: https://issues.apache.org/jira/browse/TIKA-1548 Project: Tika

[jira] [Commented] (TIKA-1165) Autodetect and parse Asciidoc

2014-03-25 Thread David Pilato (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946836#comment-13946836 ] David Pilato commented on TIKA-1165: Sounds like I never answered to your com

[jira] [Created] (TIKA-1165) Autodetect and parse Asciidoc

2013-08-20 Thread David Pilato (JIRA)
David Pilato created TIKA-1165: -- Summary: Autodetect and parse Asciidoc Key: TIKA-1165 URL: https://issues.apache.org/jira/browse/TIKA-1165 Project: Tika Issue Type: Wish Components