[GitHub] [tika] THausherr commented on pull request #332: Fix can't del tmp file in windows

2020-07-30 Thread GitBox
THausherr commented on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-04910 I agree that it shouldn't stop the process. Suggestion: output a log message, because the cause is usually a programming oversight, so that it can be reported and fixed.

[GitHub] [tika] THausherr edited a comment on pull request #332: Fix can't del tmp file in windows

2020-07-30 Thread GitBox
THausherr edited a comment on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-04910 I agree that it shouldn't stop the process. Suggestion: also output a log message, because the cause is usually a programming oversight, so that it can be reported and

[GitHub] [tika] keithrbennett commented on a change in pull request #334: Tika-3141 : add empty environment variable handle

2020-07-30 Thread GitBox
keithrbennett commented on a change in pull request #334: URL: https://github.com/apache/tika/pull/334#discussion_r463103079 ## File path: tika-core/src/main/java/org/apache/tika/config/TikaConfig.java ## @@ -249,11 +249,11 @@ public TikaConfig(ClassLoader loader) public

[jira] [Commented] (TIKA-3141) LINUX - Tika shouldn't throw an exception for an empty TIKA_CONFIG environment variable value

2020-07-30 Thread Keith Bennett (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167998#comment-17167998 ] Keith Bennett commented on TIKA-3141: - I agree that this should be handled, if it's not a

PRs on github need reviews

2020-07-30 Thread Peter Lee
Hi all, I'm using Tika recently and found it fascinating! I pushed some PRs on github but it seems no one is reviewing(so are some other PRs on github). Maybe somebody could give me a hand? Here are the PRs: https://github.com/apache/tika/pull/334

[GitHub] [tika] PeterAlfredLee commented on pull request #332: Fix can't del tmp file in windows

2020-07-30 Thread GitBox
PeterAlfredLee commented on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-666331912 Hi @THausherr , sorry for the late reply. I think the fix in [TIKA-3135](https://issues.apache.org/jira/browse/TIKA-3135) is trying to avoid occupying the file, therefore

[GitHub] [tika] PeterAlfredLee opened a new pull request #334: Tika-3141 : add empty environment variable handle

2020-07-30 Thread GitBox
PeterAlfredLee opened a new pull request #334: URL: https://github.com/apache/tika/pull/334 Trying to fix Tika-3141 with a empty string check in `TikaConfig` This is an automated message from the Apache Git Service. To

[jira] [Commented] (TIKA-3141) LINUX - Tika shouldn't throw an exception for an empty TIKA_CONFIG environment variable value

2020-07-30 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167845#comment-17167845 ] Peter Lee commented on TIKA-3141: - Hi [~nick], I'm working on Tika recently and I'm interested in this.

[jira] [Commented] (TIKA-3144) Detecting hprof memory dump files exported from Android Studio

2020-07-30 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167819#comment-17167819 ] Nick Burch commented on TIKA-3144: -- Generally you need to use the {{x-}} prefix on the subtype to mark it

Re: PDFBox regression tests?

2020-07-30 Thread Tilman Hausherr
Am 28.07.2020 um 23:51 schrieb Tim Allison: Reports are here: https://corpora.tika.apache.org/base/reports/pdfbox-2.0.21-SNAPSHOT.tgz Thank you. Besides the exceptions, there are a few cases in content extraction where "TOP_10_MORE_IN_B" is empty and "TOP_10_MORE_IN_A" has meaningful