+1 from me -- builds, tests pass, sanity check files parse, and sums look
good. But, I get a warning that the signature is not certified with a
trusted signature.

Tyler

On Wed, Oct 21, 2015 at 6:43 AM Allison, Timothy B. <[email protected]>
wrote:

> +0 (some regressions in ppt content)
>
> I just finished the batch comparison run on  ~1.8 million files in our
> govdocs1 and commoncrawl corpora comparing Tika 1.10 to 1.11-rc1.  As a
> caveat, the eval code is still in development and there may be bugs in the
> reports.
>
> Results are here:
> https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_10_vs_1_11-rc1.zip
>
> Key reports:
> contents/content_diffs.csv (file had one corrupt row when viewing in
> Excel...manually deleted offending content)
> exceptions/newExceptionsInBByMimeTypeByStackTrace.csv (small handful)
> exceptions/fixedExceptionsInBByMimeType.csv  (none!)
> mimes/mime_diffs_A_to_B.csv
>
> On the positive side:
> From "mime_diffs_A_to_B.csv", it looks like we are catching more pdfs as
> pdfs (that text/xhtml) than we were...great!  We're identifying more files
> as images (jpeg, pict) than as xhtml, and, from a quick look, this appears
> to be an improvement.  We have at least 9 new x-hwp-v5 (great!).
>
> On the negative side:
>
> 1) We have a few regressions in ppt exceptions (six of the same aioobe).
> 2) We have regressions in ppt content (it looks like we're not adding a
> new line/word break where we need to).  The regressions are small per file,
> but they affect ~220 ppts out of ~1500 (~15%).
>
> Other than the regressions in ppt content, I'd be +1, but I don't think
> this is severe enough to warrant a re-spin.  Happy to look into a fix,
> though, if we want a re-spin...and even if we don't, I'll start looking
> into this asap.
>
> -----Original Message-----
> From: Mattmann, Chris A (3980) [mailto:[email protected]]
> Sent: Monday, October 19, 2015 10:23 AM
> To: [email protected]
> Cc: [email protected]
> Subject: [VOTE] Apache Tika 1.11 Release Candidate #1
>
> Hi Folks,
>
> A first candidate for the Tika 1.11 release is available at:
>
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   http://svn.apache.org/repos/asf/tika/tags/1.11-rc1/
>
> The SHA1 checksum of the archive is
> d0dde7b3a4f1a2fb6ccd741552ea180dddab630a
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1014/
>
>
> Please vote on releasing this package as Apache Tika 1.11.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.11 [ ] -1 Do not release this
> package because…
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398) NASA Jet
> Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department University of
> Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>

Reply via email to