That's very cool. Thanks! On Mon, May 20, 2019 at 3:47 PM Lorenzo Bolzani <[email protected]> wrote:
> > I just found this: > https://www.quora.com/How-do-I-fill-holes-in-image-using-image-processing/answer/V-Sri-Chakra-Kumar > > > Il giorno mer 8 mag 2019 alle ore 09:57 Lorenzo Bolzani < > [email protected]> ha scritto: > >> Hi, >> you can try a few things, but you need to write a small script (python, >> etc.) or use imagemagick. I suggest to first try with gimp, find what works >> best, and then write the code. You want dark text on clear background. >> >> For white text on red: >> >> 1. Invert the image. Desaturate. Increase contrast. >> >> 2. split the image in RGB channels and use the one that looks better (red >> probably). Also try to decompose in HSV and see if S or V looks good. From >> gimp do: Colors -> components -> decompose. >> >> 3. invert the image and try thresholding (OTSU, etc.) >> >> With a little programming you can identify and isolate black regions from >> white ones, but I do not know if this is something you want to do. >> >> >> Post the image if this does not help. >> >> >> Lorenzo >> >> Il giorno mer 8 mag 2019 alle ore 03:07 Jason <[email protected]> ha >> scritto: >> >>> I have a problem with the current tesseract. I have documents that have >>> sections of varying background and text colors. Ive read that tesseract v3 >>> was white/black invariant and it didn't matter if I had white text on red >>> background. But now it matters. The problem is, other parts in the same >>> image are black text on white background. Tesseract 4 fails to identify the >>> white text on red background at all. >>> >>> I have tried inverting the image colors so red (0xFF0000) becomes cyan >>> (0x00FFFF) and the white text (0xFFFFFF) becomes black (0x000000). I then >>> take the highest confidence text for the region. This improves some >>> scenarios, but for the red/white scenario, does not work. >>> >>> Questions: >>> 1. How can I extract the text to be black and the background to be >>> white, before using tesseract? >>> 2. Is there a way to configure tesseract to "just work"? >>> >>> I've been trying to figure out how to do this for some time, and I >>> haven't made any progress. >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/0c9cb359-bde4-4c2e-9643-1a9c56b639dc%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/0c9cb359-bde4-4c2e-9643-1a9c56b639dc%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/Z9mFvYfTAJg/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzFxgUkCEG4AnNAsktVwYZn3ROzoyMqmdZbdesZqusoBg%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzFxgUkCEG4AnNAsktVwYZn3ROzoyMqmdZbdesZqusoBg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMd14nrE4F2SsQ-D-uQQF025CPgBQ8oKpd5q8A1H4ozwm%3DHC4Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

