+1 from me -- builds, tests pass, sanity check files parse, and sums look good. But, I get a warning that the signature is not certified with a trusted signature.
Tyler On Wed, Oct 21, 2015 at 6:43 AM Allison, Timothy B. <[email protected]> wrote: > +0 (some regressions in ppt content) > > I just finished the batch comparison run on ~1.8 million files in our > govdocs1 and commoncrawl corpora comparing Tika 1.10 to 1.11-rc1. As a > caveat, the eval code is still in development and there may be bugs in the > reports. > > Results are here: > https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_10_vs_1_11-rc1.zip > > Key reports: > contents/content_diffs.csv (file had one corrupt row when viewing in > Excel...manually deleted offending content) > exceptions/newExceptionsInBByMimeTypeByStackTrace.csv (small handful) > exceptions/fixedExceptionsInBByMimeType.csv (none!) > mimes/mime_diffs_A_to_B.csv > > On the positive side: > From "mime_diffs_A_to_B.csv", it looks like we are catching more pdfs as > pdfs (that text/xhtml) than we were...great! We're identifying more files > as images (jpeg, pict) than as xhtml, and, from a quick look, this appears > to be an improvement. We have at least 9 new x-hwp-v5 (great!). > > On the negative side: > > 1) We have a few regressions in ppt exceptions (six of the same aioobe). > 2) We have regressions in ppt content (it looks like we're not adding a > new line/word break where we need to). The regressions are small per file, > but they affect ~220 ppts out of ~1500 (~15%). > > Other than the regressions in ppt content, I'd be +1, but I don't think > this is severe enough to warrant a re-spin. Happy to look into a fix, > though, if we want a re-spin...and even if we don't, I'll start looking > into this asap. > > -----Original Message----- > From: Mattmann, Chris A (3980) [mailto:[email protected]] > Sent: Monday, October 19, 2015 10:23 AM > To: [email protected] > Cc: [email protected] > Subject: [VOTE] Apache Tika 1.11 Release Candidate #1 > > Hi Folks, > > A first candidate for the Tika 1.11 release is available at: > > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > http://svn.apache.org/repos/asf/tika/tags/1.11-rc1/ > > The SHA1 checksum of the archive is > d0dde7b3a4f1a2fb6ccd741552ea180dddab630a > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1014/ > > > Please vote on releasing this package as Apache Tika 1.11. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.11 [ ] -1 Do not release this > package because⦠> > Cheers, > Chris > > P.S. Of course here is my +1. > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) NASA Jet > Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department University of > Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > >
