Thanks Tim,

I am testing 2.8.0 with StormCrawler

Apart from a lot of warning about missing classes like
*Caused by: java.lang.ClassNotFoundException:
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream *
I am also getting a failed test when trying to extract text from an
embedded document.

I can't see anything related in the release notes apart maybe from

   * Improve extraction of embedded file names in .docx (TIKA-3968).

I've created a branch for it in SC ->
https://github.com/DigitalPebble/storm-crawler/tree/tika2.8
in case anyone has the time and inclination to try to reproduce the issue.

I'll see if I can find the source of the problem

Julien


On Tue, 9 May 2023 at 17:40, Tim Allison <talli...@apache.org> wrote:

> A candidate for the Tika 2.8.0 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/2.8.0
>
> The release candidate is a zip archive of the sources in:
> https://github.com/apache/tika/tree/2.8.0-rc1/
>
> The SHA-512 checksum of the archive is
>
> 6b514a45b87013c566e57af2b6a526bce0b3bf02a1dabefe998068aa49672ec4a7ec2ecfa538a84aca719607f339a44341caeaab1ca313fc1c161154ec095bbb.
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1093/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 2.8.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 2.8.0
> [ ] -1 Do not release this package because...
>
> Here's my +1.
>
> Best,
>
>         Tim
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>

Reply via email to