to tesseract-ocr Is version On Friday, July 19, 2024 at 12:32:25 AM UTC+7 [email protected] wrote:
> 你好,请问一下用的是哪个版本呀,方便分享一下你的chi_sim 和chi_sim_vert 的文件嘛? > > 在2024年3月17日星期日 UTC+8 00:41:13<[email protected]> 写道: > >> Hello, >> >> I am making a transcrypt of YT wideos using tessaract. >> Images I input to tessaract look like this: >> [image: aftercut29.0.jpg] >> >> The output is mostly correct but sometimes the same character give >> numerous output. >> Example: >> Input: >> [image: aftercut3.0.jpg] >> Output: 大*叔*中文 - CORRECT >> >> Input: >> [image: aftercut10.5.jpg] >> Output: 今天不是3位 大*档* - INCORRECT >> >> In preparation of the images I use: >> >> - *dilatation*, >> - *cropping the area* of image containg characters >> - I add *borders*. >> >> For dilatation I use 2x2 kernel and the border is 2px thick. >> For segmentation method I am currently experimentig with *psg --7 *and *psg >> -- 13*. psg --7 seems to give a bit better results. Of course the >> language is : *lang='chi_sim'* >> >> Could you give my any advice how to improve the robustness of the output? >> >> Thank you in advance, >> Jan >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/75da9bb5-05a5-4692-8f65-d3b9e4a89e1cn%40googlegroups.com.

