nddipiazza commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1445165563
i ran a few hundred results and i see literally no difference in
performance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
nddipiazza commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1445166101
Never mind. i was testing wrong format. we are not concerned only with the
alternative packaging format, and my previous test didn't.
100 documents parsed previously in 10252 ms
no
nddipiazza commented on PR #985:
URL: https://github.com/apache/tika/pull/985#issuecomment-1446882008
yes that is because the onenote parser for alterantive format was just
printing some general header information before. now it's actually parsing it
(slowly due to the bug) which should now