[tesseract-ocr] Re: Tesseract OCR not performing well even after data cleaning and transformations on black background data

farhad khalafi Wed, 30 Jan 2019 07:10:11 -0800

A few questions: 
Is the image you have posted the original or after you have processed? 
What is the image resolution? 
What does the extracted text look like? 
Any possibility of sharing the original image without redactions?


On Wednesday, January 30, 2019 at 3:36:23 AM UTC-7, [email protected] wrote:
>
>
>
> On Wednesday, January 30, 2019 at 2:49:49 PM UTC+5:30, farhad khalafi 
> wrote:
>>
>> @Smriti: In the latest version (1.3.0) of our free Tesseract Studio 
>> <https://github.com/OpaitSoftware/TesseractStudio.Net>, we have an 
>> experimental routine to detect and fix inverted text blocks (e.g. table 
>> headers with light text on dark background). The proper detection of image 
>> background is not an easy task. Our approach uses histograms in both 
>> horizontal and vertical directions to detect large rectangles that can 
>> potentially be header blocks. I would be curious to find out if the code 
>> works for you. You will need to set the "Fix inverted text" option under 
>> the Image tab. I ran an experiment with a sample PDF file and captured 
>> intermediate images as in the attached document. Your case might not work 
>> the same but no harm in trying.
>>  
>>
>> On Tuesday, January 29, 2019 at 11:35:47 PM UTC-7, [email protected] 
>> wrote:
>>>
>>> I have written some code for an image data to be extracted using 
>>> tesseract, in Python, i.e Pytesseract OCR. But even after various 
>>> transformations using openCV2, I am not getting satisfactory results. The 
>>> data which has a dark background is not being extracted properly even after 
>>> the background has been lightened. I have attached a sample image. The part 
>>> colored in black is being extracted properly, but the parts in blue, yellow 
>>> and red aren't being extracted well. I have put them in a square just so 
>>> that it can be noticed. In the original image, all i have is english words 
>>> and a few numbers (including decimals). Any help would be much appreciated. 
>>>
>>> Regards
>>> Smriti
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/72fa241a-7cbd-4342-a3e7-e5402b52c4d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Tesseract OCR not performing well even after data cleaning and transformations on black background data

Reply via email to