[jira] [Commented] (TIKA-3510) tika-parser-scientific-module seems to embbed many dependencies

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397680#comment-17397680 ] Hudson commented on TIKA-3510: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #308 (See

[jira] [Commented] (TIKA-3510) tika-parser-scientific-module seems to embbed many dependencies

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397641#comment-17397641 ] Tim Allison commented on TIKA-3510: --- [~tmortagne], please take a look and see if this will meet your

[jira] [Commented] (TIKA-3502) General upgrades for 2.0.1

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397535#comment-17397535 ] Hudson commented on TIKA-3502: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #306 (See

[jira] [Commented] (TIKA-3522) Reduce calls to TikaConfig.getDefaultConfig

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397534#comment-17397534 ] Hudson commented on TIKA-3522: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #306 (See

[jira] [Commented] (TIKA-3521) Move checkActive out of fetchemitworkers within AsyncProcessor

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397533#comment-17397533 ] Hudson commented on TIKA-3521: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #306 (See

[jira] [Commented] (TIKA-3515) Tika CLI -t should use UTF-8 as default output encoding

2021-08-11 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397523#comment-17397523 ] Luís Filipe Nassif commented on TIKA-3515: -- ??I think we should also deprecate the initialization

[jira] [Commented] (TIKA-3489) Robots.txt files frequently identified as message/rfc822

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397511#comment-17397511 ] Hudson commented on TIKA-3489: -- UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #144 (See

[jira] [Resolved] (TIKA-3521) Move checkActive out of fetchemitworkers within AsyncProcessor

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3521. --- Fix Version/s: 2.0.1 Assignee: Tim Allison Resolution: Fixed > Move checkActive out

[jira] [Created] (TIKA-3522) Reduce calls to TikaConfig.getDefaultConfig

2021-08-11 Thread Tim Allison (Jira)
Tim Allison created TIKA-3522: - Summary: Reduce calls to TikaConfig.getDefaultConfig Key: TIKA-3522 URL: https://issues.apache.org/jira/browse/TIKA-3522 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-3515) Tika CLI -t should use UTF-8 as default output encoding

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397490#comment-17397490 ] Hudson commented on TIKA-3515: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #305 (See

[jira] [Commented] (TIKA-3510) tika-parser-scientific-module seems to embbed many dependencies

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397488#comment-17397488 ] Hudson commented on TIKA-3510: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #305 (See

[jira] [Commented] (TIKA-3520) Revert rendering only non-text elements in auto mode for PDFs

2021-08-11 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397489#comment-17397489 ] Hudson commented on TIKA-3520: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #305 (See

Re: versions?

2021-08-11 Thread Nick Burch
On Wed, 11 Aug 2021, Tim Allison wrote: A) I think we should maintain the 1.x branch and continue to put out bug fixes for a bit. Any objections to nominally calling the next release 1.27.1 on JIRA at least? I agree we should probably try to keep 1.x going for at least a few months, to

versions?

2021-08-11 Thread Tim Allison
All, Two questions: A) I think we should maintain the 1.x branch and continue to put out bug fixes for a bit. Any objections to nominally calling the next release 1.27.1 on JIRA at least? B) We've made quite a few changes in the main branch since the release of 2.0.0. Would there be any

[jira] [Resolved] (TIKA-3489) Robots.txt files frequently identified as message/rfc822

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3489. --- Fix Version/s: 2.0.1 1.27.1 Assignee: Tim Allison Resolution: Fixed

[jira] [Resolved] (TIKA-3520) Revert rendering only non-text elements in auto mode for PDFs

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3520. --- Fix Version/s: 2.0.1 Assignee: Tim Allison Resolution: Fixed > Revert rendering only

[jira] [Resolved] (TIKA-3515) Tika CLI -t should use UTF-8 as default output encoding

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3515. --- Fix Version/s: 2.0.1 Assignee: Tim Allison Resolution: Fixed > Tika CLI -t should use

[jira] [Created] (TIKA-3521) Move checkActive out of fetchemitworkers within AsyncProcessor

2021-08-11 Thread Tim Allison (Jira)
Tim Allison created TIKA-3521: - Summary: Move checkActive out of fetchemitworkers within AsyncProcessor Key: TIKA-3521 URL: https://issues.apache.org/jira/browse/TIKA-3521 Project: Tika Issue

[jira] [Resolved] (TIKA-3483) Implement a network policy for Helm Chart

2021-08-11 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved TIKA-3483. Resolution: Fixed > Implement a network policy for Helm Chart >

[jira] [Commented] (TIKA-3519) Wonder if you can add a feature for Tika parser to stop reading metadata and body content if certain amount of memory or body content has reached

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397440#comment-17397440 ] Tim Allison commented on TIKA-3519: --- Can you share an example file with me? > Wonder if you can add a

[jira] [Updated] (TIKA-3483) Implement a network policy for Helm Chart

2021-08-11 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-3483: --- Fix Version/s: (was: 2.0.0-BETA) 2.0.1 > Implement a network

[jira] [Commented] (TIKA-3483) Implement a network policy for Helm Chart

2021-08-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397439#comment-17397439 ] ASF GitHub Bot commented on TIKA-3483: -- lewismc merged pull request #5: URL:

[GitHub] [tika-helm] lewismc merged pull request #5: [TIKA-3483] Implement a network policy for Helm Chart

2021-08-11 Thread GitBox
lewismc merged pull request #5: URL: https://github.com/apache/tika-helm/pull/5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Commented] (TIKA-3483) Implement a network policy for Helm Chart

2021-08-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397438#comment-17397438 ] ASF GitHub Bot commented on TIKA-3483: -- lewismc commented on pull request #5: URL:

[GitHub] [tika-helm] lewismc commented on pull request #5: [TIKA-3483] Implement a network policy for Helm Chart

2021-08-11 Thread GitBox
lewismc commented on pull request #5: URL: https://github.com/apache/tika-helm/pull/5#issuecomment-896946612 @bynare apologies I just ended up doing other things... I wasn't ignoring this. Thanks for your patience. LGTM -- This is an automated message from the Apache Git Service. To

[jira] [Commented] (TIKA-3519) Wonder if you can add a feature for Tika parser to stop reading metadata and body content if certain amount of memory or body content has reached

2021-08-11 Thread Xiaohong Yang (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397427#comment-17397427 ] Xiaohong Yang commented on TIKA-3519: - Can you check if you can catch the above mentioned 

[jira] [Created] (TIKA-3520) Revert rendering only non-text elements in auto mode for PDFs

2021-08-11 Thread Tim Allison (Jira)
Tim Allison created TIKA-3520: - Summary: Revert rendering only non-text elements in auto mode for PDFs Key: TIKA-3520 URL: https://issues.apache.org/jira/browse/TIKA-3520 Project: Tika Issue

[jira] [Updated] (TIKA-3515) Tika CLI -t should use UTF-8 as default output encoding

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3515: -- Affects Version/s: (was: 2.0.0-BETA) 2.0.0 > Tika CLI -t should use UTF-8 as

[jira] [Commented] (TIKA-3515) Tika CLI -t should use UTF-8 as default output encoding

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397391#comment-17397391 ] Tim Allison commented on TIKA-3515: --- If we're going to make this change in tika-app, I think we should

[jira] [Commented] (TIKA-3519) Wonder if you can add a feature for Tika parser to stop reading metadata and body content if certain amount of memory or body content has reached

2021-08-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397365#comment-17397365 ] Tim Allison commented on TIKA-3519: --- If the underlying parser (Apache POI in this case) writes content

[jira] [Commented] (TIKA-3519) Wonder if you can add a feature for Tika parser to stop reading metadata and body content if certain amount of memory or body content has reached

2021-08-11 Thread Xiaohong Yang (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17397345#comment-17397345 ] Xiaohong Yang commented on TIKA-3519: - I tried org.apache.tika.sax.WriteOutContentHandler with