[GitHub] [tika] THausherr merged pull request #966: Bump rome from 1.18.0 to 1.19.0

2023-02-15 Thread via GitHub
THausherr merged PR #966: URL: https://github.com/apache/tika/pull/966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #967: Bump maven-javadoc-plugin from 3.4.1 to 3.5.0

2023-02-15 Thread via GitHub
THausherr merged PR #967: URL: https://github.com/apache/tika/pull/967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #968: Bump aws.version from 1.12.407 to 1.12.408

2023-02-15 Thread via GitHub
THausherr merged PR #968: URL: https://github.com/apache/tika/pull/968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] dependabot[bot] opened a new pull request, #968: Bump aws.version from 1.12.407 to 1.12.408

2023-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #968: URL: https://github.com/apache/tika/pull/968 Bumps `aws.version` from 1.12.407 to 1.12.408. Updates `aws-java-sdk-s3` from 1.12.407 to 1.12.408 Changelog Sourced from

[GitHub] [tika] dependabot[bot] opened a new pull request, #967: Bump maven-javadoc-plugin from 3.4.1 to 3.5.0

2023-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #967: URL: https://github.com/apache/tika/pull/967 Bumps [maven-javadoc-plugin](https://github.com/apache/maven-javadoc-plugin) from 3.4.1 to 3.5.0. Release notes Sourced from

[GitHub] [tika] dependabot[bot] opened a new pull request, #966: Bump rome from 1.18.0 to 1.19.0

2023-02-15 Thread via GitHub
dependabot[bot] opened a new pull request, #966: URL: https://github.com/apache/tika/pull/966 Bumps [rome](https://github.com/rometools/rome) from 1.18.0 to 1.19.0. Release notes Sourced from https://github.com/rometools/rome/releases;>rome's releases. 1.19.0 What's

[jira] [Commented] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689400#comment-17689400 ] Hudson commented on TIKA-3972: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #1024 (See

[jira] [Commented] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689364#comment-17689364 ] Tim Allison commented on TIKA-3972: --- It looks like the RTFParser is actually ending the element.

[jira] [Updated] (TIKA-3452) java.nio.file.FileSystemException Read-only file system

2023-02-15 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-3452: --- Summary: java.nio.file.FileSystemException Read-only file system (was:

[jira] [Comment Edited] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689337#comment-17689337 ] Tim Allison edited comment on TIKA-3972 at 2/15/23 8:50 PM: -It looks like our

[jira] [Commented] (TIKA-3972) Parsing RTF sample with hyperlink and ToXMLContentHandler returns malformed XHTML from toString method call

2023-02-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689337#comment-17689337 ] Tim Allison commented on TIKA-3972: --- It looks like our parser requires {{fldrslt}} as a hint that the

[jira] [Commented] (TIKA-3452) java.nio.file.FileSystemException Read-only file system in 2.0.0-BETA tika-docker

2023-02-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689313#comment-17689313 ] ASF GitHub Bot commented on TIKA-3452: -- lewismc commented on PR #4: URL:

[GitHub] [tika-helm] lewismc commented on pull request #4: TIKA-3452 java.nio.file.FileSystemException Read-only file system in 2.0.0-BETA tika-docker

2023-02-15 Thread via GitHub
lewismc commented on PR #4: URL: https://github.com/apache/tika-helm/pull/4#issuecomment-1431890535 @frascu please try this out and let me know how you get on. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [tika-helm] tballison commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub
tballison commented on PR #8: URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431871189 +1 Note that we've slightly modified the docker numbering (e.g. 2.7.0.1 is the 1 docker release for Tika 2.7.0). We may want to do similar here? -- This is an automated message from the

Re: Adding arguments to configure tika from the rest calls

2023-02-15 Thread Tim Allison
Here's a first attempt at documentation: https://cwiki.apache.org/confluence/display/TIKA/Configuring+Parsers+At+Parse+Time+in+tika-server Please let me know if you have any questions or want write access to improve the documentation! On Wed, Feb 15, 2023 at 11:07 AM Julien Massiera wrote: > >

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689288#comment-17689288 ] Tim Allison commented on TIKA-3973: --- Y. > Content of Ogg file with Opus encoded content not correctly

[GitHub] [tika-helm] lewismc commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub
lewismc commented on PR #8: URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431700881 Should we release a new version of the Helm Chart? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [tika-helm] lewismc merged pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub
lewismc merged PR #8: URL: https://github.com/apache/tika-helm/pull/8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika-helm] lewismc commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub
lewismc commented on PR #8: URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431699844 Thanks for your patience @frascu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

RE: Adding arguments to configure tika from the rest calls

2023-02-15 Thread Julien Massiera
Hi Tim, bouncing back on our mail thread, could you share more documentation on how to use the header to configure the PDFParser on the fly ? Thanks, Julien -Message d'origine- De : Julien Massiera Envoyé : vendredi 3 février 2023 13:08 À : dev@tika.apache.org Objet : RE: Adding

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689207#comment-17689207 ] Adam Bialas commented on TIKA-3973: --- You mean this: {code:java}     org.apache.tika    

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689203#comment-17689203 ] Tim Allison commented on TIKA-3973: --- To emphasize Nick's point... if you need detection of other

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689199#comment-17689199 ] Nick Burch commented on TIKA-3973: -- If you only care about container-aware detection for Ogg based

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689183#comment-17689183 ] Adam Bialas commented on TIKA-3973: --- So I need those dependencies: {code:java} implementation

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689176#comment-17689176 ] Nick Burch commented on TIKA-3973: -- For all container formats you want {{tika-parsers}} or

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689170#comment-17689170 ] Adam Bialas commented on TIKA-3973: --- Which jar should I include also?  > Content of Ogg file with Opus

[jira] [Comment Edited] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689161#comment-17689161 ] Nick Burch edited comment on TIKA-3973 at 2/15/23 2:38 PM: --- For container-based

[jira] [Commented] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689161#comment-17689161 ] Nick Burch commented on TIKA-3973: -- For container-based detection (such as the Ogg container format), you

[jira] [Created] (TIKA-3973) Content of Ogg file with Opus encoded content not correctly recognized

2023-02-15 Thread Adam Bialas (Jira)
Adam Bialas created TIKA-3973: - Summary: Content of Ogg file with Opus encoded content not correctly recognized Key: TIKA-3973 URL: https://issues.apache.org/jira/browse/TIKA-3973 Project: Tika

[jira] [Commented] (TIKA-3970) Certain OneNote documents produce duplicate text

2023-02-15 Thread David Avant (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689131#comment-17689131 ] David Avant commented on TIKA-3970: --- Sadly, I am not aware of any free, non-Microsoft viewers.    But I

[GitHub] [tika-helm] frascu commented on pull request #8: Fix invalid yaml document when helm lint is executed

2023-02-15 Thread via GitHub
frascu commented on PR #8: URL: https://github.com/apache/tika-helm/pull/8#issuecomment-1431018451 Hi @lewismc Could you please review this pull request? I need a minor version of tika-helm to fix the issue. -- This is an automated message from the Apache Git Service. To respond to