Re: Windows build errors

2020-08-19 Thread Peter Lee
Hi Tilman, > expected: but was: charset=[windows-1252]> I think this problem is caused by the charset detection strategy basing on line separator(CRLF or LF) and the git autocrlf config. I also met this problem and solved it like this : Set autocrlf false by git config --global core.autocrlf

[jira] [Commented] (TIKA-3129) Tika server - track a "last parsed on" timestamp and provide an endpoint to get it

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180814#comment-17180814 ] Tim Allison commented on TIKA-3129: --- Are we not capturing that in the {{status:}} element in the

[jira] [Comment Edited] (TIKA-3129) Tika server - track a "last parsed on" timestamp and provide an endpoint to get it

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180814#comment-17180814 ] Tim Allison edited comment on TIKA-3129 at 8/19/20, 8:43 PM: - W00t! Great to

[jira] [Comment Edited] (TIKA-3154) Exception while extracting msg files

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180782#comment-17180782 ] Akash edited comment on TIKA-3154 at 8/19/20, 7:57 PM: --- Tried with below config.

[jira] [Commented] (TIKA-3154) Exception while extracting msg files

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180782#comment-17180782 ] Akash commented on TIKA-3154: - Tried with below config. Did not help {code:java} /

[jira] [Comment Edited] (TIKA-3154) Exception while extracting msg files

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180782#comment-17180782 ] Akash edited comment on TIKA-3154 at 8/19/20, 7:56 PM: --- Tried with below config.

Windows build errors

2020-08-19 Thread Tilman Hausherr
After many weeks I checked out the "main" branch, and get these build errors: Failures:   TestMimeTypes.testArchiveDetection:395->assertTypeByData:1275 expected:<[application/x-archive]> but was:<[text/plain]>  

[jira] [Comment Edited] (TIKA-3129) Tika server - track a "last parsed on" timestamp and provide an endpoint to get it

2020-08-19 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698 ] Nicholas DiPiazza edited comment on TIKA-3129 at 8/19/20, 5:14 PM: --- With

[jira] [Commented] (TIKA-3173) Tika server with spawnChild - server does not recover from OOM until an additional file comes in

2020-08-19 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180703#comment-17180703 ] Nicholas DiPiazza commented on TIKA-3173: - I'll play around with that idea and create a PR if it

[jira] [Comment Edited] (TIKA-3129) Tika server - track a "last parsed on" timestamp and provide an endpoint to get it

2020-08-19 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698 ] Nicholas DiPiazza edited comment on TIKA-3129 at 8/19/20, 5:11 PM: --- With

[jira] [Commented] (TIKA-3129) Tika server - track a "last parsed on" timestamp and provide an endpoint to get it

2020-08-19 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698 ] Nicholas DiPiazza commented on TIKA-3129: - With respect to "last parsed" offset, I put this

[jira] [Comment Edited] (TIKA-3129) Tika server - track a "last parsed on" timestamp and provide an endpoint to get it

2020-08-19 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180698#comment-17180698 ] Nicholas DiPiazza edited comment on TIKA-3129 at 8/19/20, 5:10 PM: --- With

[jira] [Commented] (TIKA-3174) tika解析ofd文档时,除了正文内容外,还出现了多余的数字。

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180550#comment-17180550 ] Tim Allison commented on TIKA-3174: --- Can you attach an example file? > tika解析ofd文档时,除了正文内容外,还出现了多余的数字。

[jira] [Commented] (TIKA-3154) Exception while extracting msg files

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180548#comment-17180548 ] Tim Allison commented on TIKA-3154: --- https://cwiki.apache.org/confluence/display/TIKA/MSOfficeParsers

[jira] [Commented] (TIKA-3154) Exception while extracting msg files

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180544#comment-17180544 ] Tim Allison commented on TIKA-3154: --- Turns out it is (or should be):

[jira] [Commented] (TIKA-3154) Exception while extracting msg files

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180512#comment-17180512 ] Akash commented on TIKA-3154: - Can we make this value configurable ? > Exception while extracting msg files >

[jira] [Created] (TIKA-3174) tika解析ofd文档时,除了正文内容外,还出现了多余的数字。

2020-08-19 Thread Jira
天空 created TIKA-3174: Summary: tika解析ofd文档时,除了正文内容外,还出现了多余的数字。 Key: TIKA-3174 URL: https://issues.apache.org/jira/browse/TIKA-3174 Project: Tika Issue Type: Bug Reporter: 天空

Re: [EXTERNAL] Tika 2.0 modularization

2020-08-19 Thread Sergey Beryozkin
Hi Tim It looks good. Perfect. Do you plant to have tika-parsers reuse the new module as its dependencies ? Cheers, Sergey On Tue, Aug 18, 2020 at 3:41 PM Tim Allison wrote: > If anyone has any time, please take a look here: > https://github.com/apache/tika/tree/branch_2x/tika-parser-modules

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180449#comment-17180449 ] Akash commented on TIKA-3172: - Thanks [~tallison] for the clarification. > PDF Parser configuration enable

[jira] [Commented] (TIKA-3173) Tika server with spawnChild - server does not recover from OOM until an additional file comes in

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180425#comment-17180425 ] Tim Allison commented on TIKA-3173: --- This is our OOM test:

Re: builds move to ci-builds

2020-08-19 Thread Tim Allison
I've disabled our old builds, and configured 4 on ci-builds. main -- Java 8 and 11 branch_1z -- Java 8 and 11 Thank you, again, Tilman and Uwe! On Fri, Aug 14, 2020 at 9:08 AM Tim Allison wrote: > All, > Tilman pointed out that on the builds list there was an announcement > that Aug 15 is

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180406#comment-17180406 ] Tim Allison commented on TIKA-3172: --- [~akki1607], y, that's expected, because you're wrapping the

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180375#comment-17180375 ] Hudson commented on TIKA-3172: -- SUCCESS: Integrated in Jenkins build tika-main-jdk11 #1022 (See

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180356#comment-17180356 ] Hudson commented on TIKA-3172: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #3 (See

tika-main-jdk14 - Build # 12 - Failure

2020-08-19 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-main-jdk14 (build #12) Status: Failure Check console output at https://builds.apache.org/job/tika-main-jdk14/12/ to view the results.

[jira] [Comment Edited] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180343#comment-17180343 ] Akash edited comment on TIKA-3172 at 8/19/20, 7:46 AM: --- [~tallison]  If we use

[jira] [Comment Edited] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180343#comment-17180343 ] Akash edited comment on TIKA-3172 at 8/19/20, 7:46 AM: --- [~tallison]  If we use

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180343#comment-17180343 ] Akash commented on TIKA-3172: - [~tallison]  If we use above mentioned tika config file to extract, the parser

[jira] [Commented] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17180333#comment-17180333 ] Tim Allison commented on TIKA-3172: --- Thank you, [~tilman]. I'm fixing this now. > PDF Parser