I'm using pytesseract and tesseract v5.3.3 to read some text from some 
images and I sometimes get these weird phantom characters. I've tried to do 
some image preprocessing like increasing the image size, erosion, 
thresholding, etc, but nothing seems to get rid of this random character 
that's spawing from nothing. Attached are two image examples (left side is 
processed, right is original with rect bounding boxes drawn), The blue 
rectangle to right of "KB PNG" is a '_' being detected even tho that space 
is completely blank. Any ideas on getting rid of this?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8800b99f-b92d-4dbf-83b8-d1d3da9c2bf4n%40googlegroups.com.

Reply via email to