Re: 1.28.2 regression results

2022-04-28 Thread Tim Allison
Tilman, Thank you for looking carefully at the reports! > commoncrawl3/OR/ORTIXLZEFH4QC5RJTV3L5XBNOVW42KPH 1Sonig is what we're getting in 2.3.0 and in the 2.4.0-soon-to-be-candidate, and it looks correct based on the underlying xml and when I open it in LibreOffice. It looks like it was incorr

Re: 1.28.2 regression results

2022-04-27 Thread Tilman Hausherr
Am 28.04.2022 um 00:25 schrieb Tim Allison: Are available here: https://corpora.tika.apache.org/base/reports/tika-1.28.2-reports-20220427.tgz I haven't taken a look yet. Let me know if you find anything. commoncrawl3/OR/ORTIXLZEFH4QC5RJTV3L5XBNOVW42KPH this is minor and is related to supers

1.28.2 regression results

2022-04-27 Thread Tim Allison
Are available here: https://corpora.tika.apache.org/base/reports/tika-1.28.2-reports-20220427.tgz I haven't taken a look yet. Let me know if you find anything. Best, Tim

Re: 1.28.2 regression results

2022-04-26 Thread Tilman Hausherr
Am 26.04.2022 um 21:45 schrieb Tim Allison: I should clarify that I fixed the two regressions that I had identified in the release candidate. The regression results that I shared were run with 1.x before those fixes. Ah ok, but then the tests should be run again after the fixes in case someth

Re: 1.28.2 regression results

2022-04-26 Thread Tim Allison
I should clarify that I fixed the two regressions that I had identified in the release candidate. The regression results that I shared were run with 1.x before those fixes. Still, let's fix the dependency convergence, and please let me know if there's anything else you find in the regression repo

Re: 1.28.2 regression results

2022-04-26 Thread Tim Allison
Hi Tilman, Thank you for raising this. 3X4JRZZ4TQ2GK4QQDQEXMFCVLM3FM5I4 is not related to TIKA-3734. The updated junrar (7.5.0) is swallowing a (new) exception on this file and stopping the parse without throwing an exception. The earlier version of junrar (7.4.1) did not find a problem with t

Re: 1.28.2 regression results

2022-04-26 Thread Tilman Hausherr
Am 26.04.2022 um 13:07 schrieb Tim Allison: Reports are here: https://corpora.tika.apache.org/base/reports/reports-tika-1.28.2-SNAPSHOT.tgz I found two issues that should be fixed (TIKA-3733 and TIKA-3734). I think both are related to the underlying parsers being stricter (which is good), but w

Re: 1.28.2 regression results

2022-04-26 Thread Tilman Hausherr
Let me know if you see anything else. The jdk11 and 17 builds fail because of a dependency convergence error. I don't know if this is really relevant, i.e. would the jdk8 build still be ok for people using tika on jdk11 and 17 ? Tilman

1.28.2 regression results

2022-04-26 Thread Tim Allison
Reports are here: https://corpora.tika.apache.org/base/reports/reports-tika-1.28.2-SNAPSHOT.tgz I found two issues that should be fixed (TIKA-3733 and TIKA-3734). I think both are related to the underlying parsers being stricter (which is good), but we need to change our code to handle these case