I'm using pytesseract and tesseract v5.3.3 to read some text from some images and I sometimes get these weird phantom characters. I've tried to do some image preprocessing like increasing the image size, erosion, thresholding, etc, but nothing seems to get rid of this random character that's spawing from nothing. Attached are two image examples (left side is processed, right is original with rect bounding boxes drawn), The blue rectangle to right of "KB PNG" is a '_' being detected even tho that space is completely blank. Any ideas on getting rid of this?
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8800b99f-b92d-4dbf-83b8-d1d3da9c2bf4n%40googlegroups.com.