It depends on your needs. There are also fast traineddata:

https://github.com/tesseract-ocr/tessdata_fast

It looks that many languages are represented.

On Saturday, September 23, 2017 at 12:38:46 PM UTC-5, Subrato Namata wrote:
>
> Thanks Quan Nguyen. My initial results show that the issue is gone. Let me 
> try with few more samples.
> Additionally, are these the best trained data of tesseract available for 
> all the other languages and we must be using these only ?
>
>
>
> On Saturday, 23 September 2017 00:02:51 UTC+5:30, Quan Nguyen wrote:
>>
>> Try best traineddata:
>>
>> https://github.com/tesseract-ocr/tessdata_best
>>
>> On Friday, September 22, 2017 at 2:24:08 AM UTC-5, Subrato Namata wrote:
>>>
>>> Environment
>>>
>>> Windows Setup: tesseract-ocr-setup-4.0.0-alpha.20170804.exe
>>> Spanish Trained Data: 
>>> https://github.com/tesseract-ocr/tessdata/raw/4.00/spa.traineddata
>>> Command Used to OCR:
>>> tesseract.exe ImageDoc.png output --oem 1 -l spa
>>> Where ImageDoc.png is a Spanish Scanned Document
>>> output is the text file output of OCRed text
>>>
>>>    - Tesseract Version: 4.0
>>>    - Platform: Windows version 64 Bit
>>>
>>> Current Behavior:
>>>
>>> In Spanish, character ‘o’ is recognized incorrectly as some round 
>>> symbol. Attached input file is ImageDoc.png and Error screenshot
>>>
>>> [image: spanish] 
>>> <https://user-images.githubusercontent.com/12831051/30733359-45541566-9f94-11e7-8bb1-e8027c2efc0e.png>
>>> [image: imagedoc] 
>>> <https://user-images.githubusercontent.com/12831051/30733369-4d785ab8-9f94-11e7-9ff4-7f594f72a8dc.png>
>>>
>>>
>>>
>>>
>>> Expected Behavior:
>>>
>>> Character ‘o’ should be recognized correctly.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e662287d-7e0e-4e2a-b776-8c75057b5bdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to