Address TODOs when we upgrade to next POI release (3.8 beta 5)
--
Key: TIKA-757
URL: https://issues.apache.org/jira/browse/TIKA-757
Project: Tika
Issue Type: Improvement
Address TODOs when we upgrade to next PDFBox release
Key: TIKA-758
URL: https://issues.apache.org/jira/browse/TIKA-758
Project: Tika
Issue Type: Improvement
Reporter: Michael
Improve performance when parsing embedded Office docs
-
Key: TIKA-753
URL: https://issues.apache.org/jira/browse/TIKA-753
Project: Tika
Issue Type: Improvement
Components: parser
Small improvements to how embedded docs are parsed in
AbstractPOIFSExtractor.handleEmbeddedOfficeDoc
Key: TIKA-751
URL:
PDF2XHTML fails to insert p nor space around page marker
--
Key: TIKA-742
URL: https://issues.apache.org/jira/browse/TIKA-742
Project: Tika
Issue Type: Bug
Components: parser
Tika fails to extract text from PDF annotations
---
Key: TIKA-738
URL: https://issues.apache.org/jira/browse/TIKA-738
Project: Tika
Issue Type: Bug
Components: parser
OpenOffice parser: master footer text isn't extracted
-
Key: TIKA-736
URL: https://issues.apache.org/jira/browse/TIKA-736
Project: Tika
Issue Type: Bug
Components: parser