Tilman,
Thank you for looking carefully at the reports!
> commoncrawl3/OR/ORTIXLZEFH4QC5RJTV3L5XBNOVW42KPH
1Sonig is what we're getting in 2.3.0 and in the
2.4.0-soon-to-be-candidate, and it looks correct based on the
underlying xml and when I open it in LibreOffice. It looks like it
was incorr
Am 28.04.2022 um 00:25 schrieb Tim Allison:
Are available here:
https://corpora.tika.apache.org/base/reports/tika-1.28.2-reports-20220427.tgz
I haven't taken a look yet.
Let me know if you find anything.
commoncrawl3/OR/ORTIXLZEFH4QC5RJTV3L5XBNOVW42KPH
this is minor and is related to supers
Are available here:
https://corpora.tika.apache.org/base/reports/tika-1.28.2-reports-20220427.tgz
I haven't taken a look yet.
Let me know if you find anything.
Best,
Tim
Am 26.04.2022 um 21:45 schrieb Tim Allison:
I should clarify that I fixed the two regressions that I had
identified in the release candidate. The regression results that I
shared were run with 1.x before those fixes.
Ah ok, but then the tests should be run again after the fixes in case
someth
I should clarify that I fixed the two regressions that I had
identified in the release candidate. The regression results that I
shared were run with 1.x before those fixes.
Still, let's fix the dependency convergence, and please let me know if
there's anything else you find in the regression repo
Hi Tilman,
Thank you for raising this. 3X4JRZZ4TQ2GK4QQDQEXMFCVLM3FM5I4 is not
related to TIKA-3734. The updated junrar (7.5.0) is swallowing a
(new) exception on this file and stopping the parse without throwing
an exception. The earlier version of junrar (7.4.1) did not find a
problem with t
Am 26.04.2022 um 13:07 schrieb Tim Allison:
Reports are here:
https://corpora.tika.apache.org/base/reports/reports-tika-1.28.2-SNAPSHOT.tgz
I found two issues that should be fixed (TIKA-3733 and TIKA-3734). I
think both are related to the underlying parsers being stricter (which
is good), but w
Let me know if you see anything else.
The jdk11 and 17 builds fail because of a dependency convergence error.
I don't know if this is really relevant, i.e. would the jdk8 build still
be ok for people using tika on jdk11 and 17 ?
Tilman
Reports are here:
https://corpora.tika.apache.org/base/reports/reports-tika-1.28.2-SNAPSHOT.tgz
I found two issues that should be fixed (TIKA-3733 and TIKA-3734). I
think both are related to the underlying parsers being stricter (which
is good), but we need to change our code to handle these case