Re: [tesseract-ocr] Re: Reading image from Rubber

Taresh Chaudhari Tue, 26 Nov 2024 05:02:05 -0800

Thanks Mahmoud for sharing. I did apply these techniques, but still results 
are not good and still trying to solve this problem. Let me see how does it 
proceed.


On Tuesday, 26 November 2024 at 00:31:29 UTC+5:30 [email protected] 
wrote:

> To improve the accuracy of text extraction, you can preprocess the image 
> before passing it to the OCR engine. Preprocessing techniques like 
> converting the image to grayscale, enhancing contrast, or applying filters 
> can help reduce noise and improve readability. Additionally, tweaking the 
> pytesseract settings like changing the --psm value may also improve the 
> results.
>
> Here’s an updated version of your code with some preprocessing steps:
> import pytesseract
> from PIL import Image, ImageEnhance, ImageFilter
>
> pytesseract.pytesseract.tesseract_cmd = 
> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
>
> # Path to your image
> image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'
>
> def extract_text_from_image(image_path):
>     # Open the image
>     img = Image.open(image_path)
>
>     # Convert the image to grayscale to improve text-background contrast
>     img = img.convert('L')  # Convert image to grayscale
>     img = ImageEnhance.Contrast(img).enhance(2)  # Increase contrast
>     img = img.filter(ImageFilter.SHARPEN)  # Sharpen the image
>
>     # Use pytesseract to extract text
>
>
>     extracted_text = pytesseract.image_to_string(img, config='--psm 6')  # 
> PSM 6 assumes a block of text
>     return extracted_text.strip()
>
> # Extract and print text
> text = extract_text_from_image(image_path)
> print(f"Text extracted from {image_path}: {text}")
>
> في الاثنين، ٢٥ نوفمبر ٢٠٢٤، ٤:١٢ م Taresh Chaudhari <[email protected]> 
> كتب:
>
>> Attaching a image for reference.
>>
>> On Monday, 25 November 2024 at 15:52:27 UTC+5:30 Taresh Chaudhari wrote:
>>
>>> Hi, 
>>> I am trying to read the characters from the image, which has characters 
>>> with black color in the background. Attaching the code which i used to 
>>> extract, currently its giving the partial output. Can you help me to guide 
>>> how to make it accurate? 
>>>
>>>
>>> import pytesseract
>>> from PIL import Image
>>> pytesseract.pytesseract.tesseract_cmd = 
>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
>>> # Paths to your images
>>> image_paths = [
>>>    'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg']
>>>
>>> # Function to process an image and extract text
>>> def extract_text_from_image(image_path):
>>>     # Open the image
>>>     img = Image.open(image_path)
>>>     
>>>     # Use pytesseract to perform OCR
>>>     extracted_text = pytesseract.image_to_string(img, config='--psm 6') 
>>>  # PSM 6 assumes a block of text
>>>     return extracted_text.strip()
>>>
>>> # Process all images and print results
>>> for img_path in image_paths:
>>>     text = extract_text_from_image(img_path)
>>>     print(f"Text extracted from {img_path}: {text}")
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com.

Re: [tesseract-ocr] Re: Reading image from Rubber

Reply via email to