[GitHub] [tika] THausherr merged pull request #696: Bump aws.version from 1.12.301 to 1.12.303

2022-09-14 Thread GitBox
THausherr merged PR #696: URL: https://github.com/apache/tika/pull/696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #697: Bump maven-shade-plugin from 3.3.0 to 3.4.0

2022-09-14 Thread GitBox
THausherr merged PR #697: URL: https://github.com/apache/tika/pull/697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #695: Bump protobuf-java from 3.21.5 to 3.21.6

2022-09-14 Thread GitBox
THausherr merged PR #695: URL: https://github.com/apache/tika/pull/695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] dependabot[bot] opened a new pull request, #697: Bump maven-shade-plugin from 3.3.0 to 3.4.0

2022-09-14 Thread GitBox
dependabot[bot] opened a new pull request, #697: URL: https://github.com/apache/tika/pull/697 Bumps [maven-shade-plugin](https://github.com/apache/maven-shade-plugin) from 3.3.0 to 3.4.0. Commits https://github.com/apache/maven-shade-plugin/commit/885de678577573111568e80b45869a

[GitHub] [tika] dependabot[bot] opened a new pull request, #696: Bump aws.version from 1.12.301 to 1.12.303

2022-09-14 Thread GitBox
dependabot[bot] opened a new pull request, #696: URL: https://github.com/apache/tika/pull/696 Bumps `aws.version` from 1.12.301 to 1.12.303. Updates `aws-java-sdk-transcribe` from 1.12.301 to 1.12.303 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELO

[GitHub] [tika] dependabot[bot] opened a new pull request, #695: Bump protobuf-java from 3.21.5 to 3.21.6

2022-09-14 Thread GitBox
dependabot[bot] opened a new pull request, #695: URL: https://github.com/apache/tika/pull/695 Bumps [protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.21.5 to 3.21.6. Commits https://github.com/protocolbuffers/protobuf/commit/24487dd1045c7f3d64a21f38a3f0c06cc4c

[jira] [Commented] (TIKA-3854) Bump main's development version to 2.5.0-SNAPSHOT

2022-09-14 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604949#comment-17604949 ] Hudson commented on TIKA-3854: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #

[jira] [Commented] (TIKA-3848) IllegalArgumentException in DBFColumnHeader.setType()

2022-09-14 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604948#comment-17604948 ] Hudson commented on TIKA-3848: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #

Re: Next releases?

2022-09-14 Thread Tim Allison
Sorry, of course. Thank you. Just took care of that one. On Wed, Sep 14, 2022 at 2:09 PM Tilman Hausherr wrote: > > On 14.09.2022 19:48, Tim Allison wrote: > > All, > > > > I'm going to start preliminary regression tests for 2.5.0 shortly. > > I'll also test with the new POI rc. We should wait

[jira] [Commented] (TIKA-3850) Spanish text is incorrectly detected as Galician

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604903#comment-17604903 ] Tim Allison commented on TIKA-3850: --- I concur with Nick. For kicks, I ran this with our

[jira] [Commented] (TIKA-3764) Add unit tests for Solr 9

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604899#comment-17604899 ] Tim Allison commented on TIKA-3764: --- Intentionally leaving this open until we have time

[jira] [Resolved] (TIKA-3767) Use junit's @TempDir where possible

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3767. --- Fix Version/s: 2.5.0 Resolution: Fixed I think we've done what we can on this one. > Use jun

[jira] [Resolved] (TIKA-3793) General upgrades for 1.28.5

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3793. --- Resolution: Fixed 1.28.5 released today > General upgrades for 1.28.5 > --- >

[jira] [Updated] (TIKA-3795) General upgrades for 2.5.0

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3795: -- Summary: General upgrades for 2.5.0 (was: General upgrades for 2.4.2) > General upgrades for 2.5.0 > --

[jira] [Commented] (TIKA-3826) Helm: use appVersion from Charts.yaml intsead of images.tag

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604894#comment-17604894 ] Tim Allison commented on TIKA-3826: --- [~lewismc] any chance you can take a look at this?

[jira] [Resolved] (TIKA-3848) IllegalArgumentException in DBFColumnHeader.setType()

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3848. --- Fix Version/s: 2.5.0 Resolution: Fixed > IllegalArgumentException in DBFColumnHeader.setType()

[jira] [Resolved] (TIKA-3846) Improve JDBC emitter to handle attachments and batch updates

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3846. --- Fix Version/s: 2.5.0 Resolution: Fixed > Improve JDBC emitter to handle attachments and batch u

[jira] [Resolved] (TIKA-3843) use commons-io byte array streams

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3843. --- Fix Version/s: 2.5.0 Resolution: Fixed Thank you [~pj.fanning]! > use commons-io byte array st

[jira] [Commented] (TIKA-3843) use commons-io byte array streams

2022-09-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604888#comment-17604888 ] ASF GitHub Bot commented on TIKA-3843: -- tballison merged PR #671: URL: https://github

[GitHub] [tika] tballison merged pull request #671: [TIKA-3843] use commons-io byte streams

2022-09-14 Thread GitBox
tballison merged PR #671: URL: https://github.com/apache/tika/pull/671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

Re: Next releases?

2022-09-14 Thread Tilman Hausherr
On 14.09.2022 19:48, Tim Allison wrote: All, I'm going to start preliminary regression tests for 2.5.0 shortly. I'll also test with the new POI rc. We should wait for the new jempbox before the release, I think. Let me know if there's anything else we need to get into 2.5.0. This one: https

[jira] [Commented] (TIKA-3853) Enable configuring digests via autodetectparserconfig

2022-09-14 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604885#comment-17604885 ] Hudson commented on TIKA-3853: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #

[jira] [Commented] (TIKA-3852) Extract signature info from PDFs

2022-09-14 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604884#comment-17604884 ] Hudson commented on TIKA-3852: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #

[jira] [Resolved] (TIKA-3854) Bump main's development version to 2.5.0-SNAPSHOT

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3854. --- Fix Version/s: 2.5.0 Resolution: Fixed > Bump main's development version to 2.5.0-SNAPSHOT > --

[jira] [Created] (TIKA-3854) Bump main's development version to 2.5.0-SNAPSHOT

2022-09-14 Thread Tim Allison (Jira)
Tim Allison created TIKA-3854: - Summary: Bump main's development version to 2.5.0-SNAPSHOT Key: TIKA-3854 URL: https://issues.apache.org/jira/browse/TIKA-3854 Project: Tika Issue Type: Task

Re: Next releases?

2022-09-14 Thread Tim Allison
All, I'm going to start preliminary regression tests for 2.5.0 shortly. I'll also test with the new POI rc. We should wait for the new jempbox before the release, I think. Let me know if there's anything else we need to get into 2.5.0. Cheers, Tim On Wed, Sep 7, 2022 at 2:06 PM Tim

[jira] [Resolved] (TIKA-3853) Enable configuring digests via autodetectparserconfig

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3853. --- Fix Version/s: 2.4.2 Resolution: Fixed > Enable configuring digests via autodetectparserconfig

[jira] [Commented] (TIKA-3852) Extract signature info from PDFs

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604836#comment-17604836 ] Tim Allison commented on TIKA-3852: --- Thank you [~tilman]! > Extract signature info from

[jira] [Resolved] (TIKA-3852) Extract signature info from PDFs

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3852. --- Fix Version/s: 2.4.2 Resolution: Fixed > Extract signature info from PDFs > ---

[jira] [Commented] (TIKA-3852) Extract signature info from PDFs

2022-09-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604815#comment-17604815 ] Tilman Hausherr commented on TIKA-3852: --- {{getSignatureDictionaries()}} calls {{getS

[jira] [Commented] (TIKA-3816) Tika cannot parse the text in the table(Microsoft word)

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604805#comment-17604805 ] Tim Allison commented on TIKA-3816: --- I opened: https://bz.apache.org/bugzilla/show_bug.c

[jira] [Updated] (TIKA-3816) Tika cannot parse the text in the table(Microsoft word)

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3816: -- Fix Version/s: (was: 2.4.2) > Tika cannot parse the text in the table(Microsoft word) >

[jira] [Commented] (TIKA-3816) Tika cannot parse the text in the table(Microsoft word)

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604803#comment-17604803 ] Tim Allison commented on TIKA-3816: --- One workaround is to use the SAX docx parser, which

[jira] [Commented] (TIKA-3816) Tika cannot parse the text in the table(Microsoft word)

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604801#comment-17604801 ] Tim Allison commented on TIKA-3816: --- Just picking this up now. Thank you for submitting

[ANNOUNCE] Apache Tika 1.28.5 released

2022-09-14 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika 1.28.5. The release contents have been pushed out to the main Apache release site and to the Maven Central sync. Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documen

[jira] [Comment Edited] (TIKA-3852) Extract signature info from PDFs

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604722#comment-17604722 ] Tim Allison edited comment on TIKA-3852 at 9/14/22 2:04 PM: Go

[jira] [Commented] (TIKA-3852) Extract signature info from PDFs

2022-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17604722#comment-17604722 ] Tim Allison commented on TIKA-3852: --- Got it. Thank you. Is there any meaningful differ

[jira] [Created] (TIKA-3853) Enable configuring digests via autodetectparserconfig

2022-09-14 Thread Tim Allison (Jira)
Tim Allison created TIKA-3853: - Summary: Enable configuring digests via autodetectparserconfig Key: TIKA-3853 URL: https://issues.apache.org/jira/browse/TIKA-3853 Project: Tika Issue Type: Wish