[ 
https://issues.apache.org/jira/browse/SOLR-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-11773:
-------------------------------------
    Component/s: contrib - Solr Cell (Tika extraction)

> configurable language config for tesseract ocr
> ----------------------------------------------
>
>                 Key: SOLR-11773
>                 URL: https://issues.apache.org/jira/browse/SOLR-11773
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 7.1
>            Reporter: Advokat
>            Priority: Minor
>
> Currently to change the language for tesseract I have to manipulate the 
> \org\apache\tika\parser\ocr\TesseractOCRConfig.properties in 
> tika-parsers-1.16.jar.
> There is no possibility to set the language in solrconfig.xml or on each 
> request to the ExtractingRequestHandler.
> If someone has documents with different languages its impossible to configure 
> this. Tesseract will not work as good as it could with correct set language.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to