[GitHub] tika pull request #136: TIKA-2106. Need to lowercase the output file to matc...

2016-09-30 Thread epugh
GitHub user epugh opened a pull request: https://github.com/apache/tika/pull/136 TIKA-2106. Need to lowercase the output file to match the format passed to tesse… …ract cmd line. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] tika pull request #133: add hOCR output format to TesseractParser TIKA-2093

2016-09-22 Thread epugh
GitHub user epugh opened a pull request: https://github.com/apache/tika/pull/133 add hOCR output format to TesseractParser TIKA-2093 Small change to Tesseract OCR code to add the hOCR outputType. In the future we can add `pdf` and `tsv` as output types as well. First