nvm, the config --oem 3 --psm 6  extract text real good but if the image 
like bellow, it combine 2 paragraph to 1 , so i use config --oem 3 --psm 4 , 
work great but skip lot of text in page .  Now the problem i have is the 
image i read sometimes have both 2 kind of text:
-Text read from left to right
-Text read from top to bottom

How can i detect it to  switch between tessdata (if i remember correctly: 
jpn used to read left to right text and jpn_vert used to read top to bottom 
text). Thanks
[image: Screen Shot 2023-12-26 at 10.28.28.png]


Vào lúc 11:01:18 UTC+7 ngày Thứ Hai, 25 tháng 12, 2023, g...@hobbelt.com đã 
viết:

> See also discussion in mailing list at 
> https://groups.google.com/d/msgid/tesseract-ocr/f86e2d35-4c35-4643-835f-109994e46632n%40googlegroups.com?utm_medium=email&utm_source=footer
>
> Plus https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md, 
> which is the most important documentation page that addresses all kinds of 
> OCR result quality issues such as this.
>
>
>
>
> On Fri, 22 Dec 2023, 05:58 Hoang Pham Huy, <akiray...@gmail.com> wrote:
>
>> Currently i'm trying to read this image in Japanese for translating, but 
>> the result kinda odd. What should i do to improve it?
>>
>> I'm only using this code for extract text from the image using Japanese 
>> tessdata_best <https://github.com/tesseract-ocr/tessdata_best> and some 
>> others:
>>
>> ```
>> def extract_text_from_image(self, image_path):
>> img = cv2.imread(image_path) 
>> text = pytesseract.image_to_string(img, 
>> lang='jpn+jpn_vert+jpn_ver5+eng+osd+equ')
>> return text.strip()
>> ```
>>
>>
>> [image: Screen Shot 2023-12-22 at 10.12.00.png]
>>
>> -- 
>>
> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/06f86c3c-4b4c-4a99-b2fa-50f38b13d54bn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/06f86c3c-4b4c-4a99-b2fa-50f38b13d54bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/afbc1c77-a1c5-43a1-8130-86eec8e94ad0n%40googlegroups.com.

Reply via email to