Re: Problem with cyrillics letters through Tika OCR indexing

2017-02-10 Thread Игорь Абрашин
The same problem for me. So, first case probably or how to force tika parser recognize cyrillic character as required. For me it tries to recognize russian text as eng translit, show up in result russian text utilize only latin alphabet. 10 февр. 2017 г. 17:55 пользователь "Alexandre Rafalovitch"

Re: Problem with cyrillics letters through Tika OCR indexing

2017-02-10 Thread Alexandre Rafalovitch
At what level is this exactly a problem? Are you looking for a way for Solr to pass -L rus flag to Tika? Or you are saying that whatever OCR is used here is bad. In the second case, this is probably not a question for Solr or even Tika but for whatever underlying OCR library is. The stack is deep

Problem with cyrillics letters through Tika OCR indexing

2017-02-09 Thread Абрашин , Игорь Олегович
Hello, everyone I'm encountered the error mentioned at the title? The original image attached and recognized text below: 3ApaBCTyI7ITe 9| )KVIBy xopomo Does anyone faced the similar? Need to mentioned that tesseract recognize it more correctly with -l rus option. Thanks in advance! С уважением,