That's very cool. Thanks!

On Mon, May 20, 2019 at 3:47 PM Lorenzo Bolzani <[email protected]> wrote:

>
> I just found this:
> https://www.quora.com/How-do-I-fill-holes-in-image-using-image-processing/answer/V-Sri-Chakra-Kumar
>
>
> Il giorno mer 8 mag 2019 alle ore 09:57 Lorenzo Bolzani <
> [email protected]> ha scritto:
>
>> Hi,
>> you can try a few things, but you need to write a small script (python,
>> etc.) or use imagemagick. I suggest to first try with gimp, find what works
>> best, and then write the code. You want dark text on clear background.
>>
>> For white text on red:
>>
>> 1. Invert the image. Desaturate. Increase contrast.
>>
>> 2. split the image in RGB channels and use the one that looks better (red
>> probably). Also try to decompose in HSV and see if S or V looks good. From
>> gimp do: Colors -> components -> decompose.
>>
>> 3. invert the image and try thresholding (OTSU, etc.)
>>
>> With a little programming you can identify and isolate black regions from
>> white ones, but I do not know if this is something you want to do.
>>
>>
>> Post the image if this does not help.
>>
>>
>> Lorenzo
>>
>> Il giorno mer 8 mag 2019 alle ore 03:07 Jason <[email protected]> ha
>> scritto:
>>
>>> I have a problem with the current tesseract. I have documents that have
>>> sections of varying background and text colors. Ive read that tesseract v3
>>> was white/black invariant and it didn't matter if I had white text on red
>>> background. But now it matters. The problem is, other parts in the same
>>> image are black text on white background. Tesseract 4 fails to identify the
>>> white text on red background at all.
>>>
>>> I have tried inverting the image colors so red (0xFF0000) becomes cyan
>>> (0x00FFFF) and the white text (0xFFFFFF) becomes black (0x000000). I then
>>> take the highest confidence text for the region. This improves some
>>> scenarios, but for the red/white scenario, does not work.
>>>
>>> Questions:
>>> 1. How can I extract the text to be black and the background to be
>>> white, before using tesseract?
>>> 2. Is there a way to configure tesseract to "just work"?
>>>
>>> I've been trying to figure out how to do this for some time, and I
>>> haven't made any progress.
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/0c9cb359-bde4-4c2e-9643-1a9c56b639dc%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/0c9cb359-bde4-4c2e-9643-1a9c56b639dc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/Z9mFvYfTAJg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzFxgUkCEG4AnNAsktVwYZn3ROzoyMqmdZbdesZqusoBg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzFxgUkCEG4AnNAsktVwYZn3ROzoyMqmdZbdesZqusoBg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMd14nrE4F2SsQ-D-uQQF025CPgBQ8oKpd5q8A1H4ozwm%3DHC4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to