to tesseract-ocr  Is version
On Friday, July 19, 2024 at 12:32:25 AM UTC+7 [email protected] wrote:

> 你好,请问一下用的是哪个版本呀,方便分享一下你的chi_sim 和chi_sim_vert 的文件嘛?
>
> 在2024年3月17日星期日 UTC+8 00:41:13<[email protected]> 写道:
>
>> Hello, 
>>
>> I am making a transcrypt of YT wideos using tessaract. 
>> Images I input to tessaract look like this:
>> [image: aftercut29.0.jpg]
>>
>> The output is mostly correct but sometimes the same character give 
>> numerous output.
>> Example: 
>> Input:
>> [image: aftercut3.0.jpg]
>> Output: 大*叔*中文 - CORRECT
>>
>> Input:
>> [image: aftercut10.5.jpg] 
>> Output: 今天不是3位 大*档* - INCORRECT
>>
>> In preparation of the images I use:
>>
>>    -  *dilatation*, 
>>    - *cropping the area* of image containg characters
>>    -  I add *borders*.
>>
>>  For dilatation I use 2x2 kernel and the border is 2px thick.
>>  For segmentation method I am currently experimentig with *psg --7 *and *psg 
>> -- 13*. psg --7 seems to give a bit better results. Of course the 
>> language is : *lang='chi_sim'*
>>
>> Could you give my any advice how to improve the robustness of the output?
>>
>> Thank you in advance,
>> Jan
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/75da9bb5-05a5-4692-8f65-d3b9e4a89e1cn%40googlegroups.com.

Reply via email to