Am 23.03.22 um 05:28 schrieb Tilman Hausherr:
I have created two issues on parsing exceptions, and it's not PDFBOX-5283. Maybe
it's the same, maybe not. Re text extraction, I looked at one of the files
(414724.pdf) and there's also a parsing warning, so maybe that is related too so
lets just wa
I have created two issues on parsing exceptions, and it's not
PDFBOX-5283. Maybe it's the same, maybe not. Re text extraction, I
looked at one of the files (414724.pdf) and there's also a parsing
warning, so maybe that is related too so lets just wait.
Tilman
Am 22.03.2022 um 18:21 schrieb Ti
I don't have much time right now, but I just tested 077867.pdf and
392443.pdf and it's definitively a regression. I wonder if it was
PDFBOX-5283.
The files in content_diffs_no_exceptions.xls where the T column is non
empty are suspicious and need more investigation.
Tilman
Am 22.03.2022 um
Reports are here:
https://corpora.tika.apache.org/base/reports/tika-2.3-vs-2.4-pdfs.tgz
It looks like no significant changes. Some diffs on a few files, but
this was run on ~800k PDFs.
There are a couple of cases where a file is now being detected as
rfc822 instead of PDF. We have to fix that o
Am 21.03.22 um 12:21 schrieb Tim Allison:
I'm happy to run the tests today if that would be of any interest.
Yes, please.
TIA
Andreas
On Sun, Mar 20, 2022 at 5:01 PM Andreas Lehmkuehler wrote:
Am 13.03.22 um 14:20 schrieb Tim Allison:
From Tika's perspective, there's no rush. We're
I'm happy to run the tests today if that would be of any interest.
On Sun, Mar 20, 2022 at 5:01 PM Andreas Lehmkuehler wrote:
>
> Am 13.03.22 um 14:20 schrieb Tim Allison:
> > From Tika's perspective, there's no rush. We're waiting for a bug fix
> > in POI (TIKA-3699).
> >
> > Please let me know
Am 13.03.22 um 14:20 schrieb Tim Allison:
From Tika's perspective, there's no rush. We're waiting for a bug fix
in POI (TIKA-3699).
Please let me know if/when I should run the regression tests.
Thanks for the offer. Do we need to run the tests before cutting the release?
Most of the tickets a
Due to a possible issue in ToUnicodeWriter.writeTo (see dev@) I'm going to
postpone the release for a week. I'd like to have a look at the issue and the
proposed solution first. IMHO we should solve that issue ASAP to ensure that
pdfs created with PDFBox follow the specs.
Andreas
Am 10.03.22
>From Tika's perspective, there's no rush. We're waiting for a bug fix
in POI (TIKA-3699).
Please let me know if/when I should run the regression tests.
Thank you, all!
Cheers,
Tim
On Sat, Mar 12, 2022 at 5:29 AM Andreas Lehmkuehler wrote:
>
> Am 11.03.22 um 08:30 schrieb Tilman H
Am 11.03.22 um 08:30 schrieb Tilman Hausherr:
Am 11.03.2022 um 08:19 schrieb Andreas Lehmkuehler:
Am 10.03.22 um 20:16 schrieb Tilman Hausherr:
I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
It's there for quite some time and it seems to be a seldom corner case. IMHO
it can wait
Am 11.03.2022 um 08:19 schrieb Andreas Lehmkuehler:
Am 10.03.22 um 20:16 schrieb Tilman Hausherr:
I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
It's there for quite some time and it seems to be a seldom corner
case. IMHO it can wait if we won't find a solution before Monday.
No
Am Freitag, dem 11.03.2022 um 08:19 +0100 schrieb Andreas Lehmkuehler:
> Am 10.03.22 um 20:16 schrieb Tilman Hausherr:
> > I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
> It's there for quite some time and it seems to be a seldom corner
> case. IMHO it
> can wait if we won't find a
Am 10.03.22 um 20:16 schrieb Tilman Hausherr:
I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
It's there for quite some time and it seems to be a seldom corner case. IMHO it
can wait if we won't find a solution before Monday.
WDYT?
Andreas
Tilman
Am 10.03.2022 um 19:05 schrie
I'd agree but that might mean PDFBOX-5384 wouldn't be fixed.
Tilman
Am 10.03.2022 um 19:05 schrieb Andreas Lehmkuehler:
Am 09.03.22 um 17:07 schrieb Tim Allison:
All,
I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.
Are there plans for
Am 09.03.22 um 17:07 schrieb Tim Allison:
All,
I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.
Are there plans for a 2.0.26 release? We're probably a few weeks out
How about cutting the release next Monday?
Andreas
from starting our
All,
I've been out of the office for a bit and haven't caught up yet.
Apologies if I've missed the discussion.
Are there plans for a 2.0.26 release? We're probably a few weeks out
from starting our next 1.x and 2.x releases on Tika, and it would be
great to incorporate 2.0.26. No problem at all
Am 24.02.2022 um 07:41 schrieb Andreas Lehmkuehler:
Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:
Hi,
I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]
I'm going to cut the release next weekend, if nobody
Am 22.02.22 um 07:49 schrieb Andreas Lehmkuehler:
Hi,
I'm planning to cut a new JBIG2 release next week. There aren't that much
changes but I think the fixes are worth to be released. [1]
I'm going to cut the release next weekend, if nobody objects.
Once it is done we should think about a 2.0
+1
Tilman
Am 22.02.2022 um 07:49 schrieb Andreas Lehmkuehler:
Hi,
I'm planning to cut a new JBIG2 release next week. There aren't that
much changes but I think the fixes are worth to be released. [1]
WDYT?
Andreas
[1]
https://issues.apache.org/jira/issues/?jql=project%20%3D%2012310760%20
Hi,
I'm planning to cut a new JBIG2 release next week. There aren't that much
changes but I think the fixes are worth to be released. [1]
WDYT?
Andreas
[1]
https://issues.apache.org/jira/issues/?jql=project%20%3D%2012310760%20AND%20fixVersion%20%3D%2012346618%20ORDER%20BY%20priority%20DESC%
20 matches
Mail list logo