If you can, try pre-processing and inverting the image so it is black text 
on a white background.  I found that recognition works much better with the 
preprocessing (probably since the models were trained with that kind of 
input)

On Tuesday, July 30, 2024 at 10:45:56 PM UTC+8 allelu...@gmail.com wrote:

> I'm trying to use a tesseract in project wrote in C#. I have a problem 
> with reading text from a part of an image. I'm trying to find this 4 signs 
> (in example 0000) and number after "e". Additionally, for some examples it 
> is working perfectly but for some others its printing "Empty page!!!". 
> Difference between examples is color of the background but whole image 
> processing is the same for every try. What should I do to minimize 
> probability of error?
>
>
> Thats the image where ocr is working correctly:
> [image: working.jpg]
>
> and here is not working: 
>
> [image: not working.jpg]
>
>
>
> Part of code in c#:
>
>
> public static class Sign
> {
>     public static void Verify()
>     {
>         string imagePath = "path.bmp";
>         Mat imageSign = new Mat(imagePath);
>
>         int h = imageSign.Rows;
>         int w = imageSign.Cols;
>         int point1 = (int)(0.01 * w);
>         int point2 = (int)(0.6 * h);
>         int point3 = (int)(0.3 * w);
>         int point4 = (int)(0.9 * h);
>         OpenCvSharp.Point start_point = new OpenCvSharp.Point(point1, 
> point2);
>         OpenCvSharp.Point end_point = new OpenCvSharp.Point(point3, 
> point4);
>         imageSign = new Mat(imageSign, new OpenCvSharp.Rect(point1, 
> point2, point3 - point1, point4 - point2));
>         Cv2.Resize(imageSign, imageSign, new OpenCvSharp.Size(), 2, 2);
>         imageSign.SaveImage(imagePath);
>         
>         using (Bitmap bitmap = (Bitmap)Image.FromFile(imagePathE))
>         {
>             using (Bitmap newBitmap = new Bitmap(bitmap))
>             {
>                 string imagePathA = "2nd image path.bmp";
>                 newBitmap.SetResolution(300, 300);
>                 newBitmap.Save(imagePathA);
>             }
>         }
>
>
>
>
>         string imagePathB = " "2nd image path.bmp " ;
>         var pixFromFile = Pix.LoadFromFile(imagePathB);
>         string customConfig = "--psm 10 --oem 3";
>         using (var engine = new TesseractEngine(@"C:\Program 
> Files\Tesseract-OCR\tessdata", "eng", EngineMode.Default))
>         {
>
>             engined.SetVariable("tessedit_char_whitelist", "0123456789");
>             using (var page = engined.Process(pixFromFile, customConfig))
>             {
>                 string text = page.GetText();
>                 Console.Write(text);
>
>                 string[] lines = text.Split('\n');
>                 bool linijka = false;
>
>                 foreach (string line in lines)
>                 {
>                     if (line.Length == 4 || line.Length == 5)
>                     {
>                         Console.WriteLine("Oznaczenie e5: ");
>                         Console.WriteLine(line);
>                         linijka = true;
>                     }
>                     if (line.Length == 1)
>                     {
>                         Console.WriteLine("e_:");
>                         Console.WriteLine(line);
>                     }
>                 }
>
>                
>                 Cv2.ImShow("koniec", imageSign);
>                 Cv2.WaitKey(0);
>             }
>         }
>
> I tried cropping an image and for some reason when i making it bigger or 
> smaller than it is now, it adversely affects on results. Additionally I 
> tried some other tesseract psm configurations and changed dpi of image to 
> 300.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/442e78c7-2432-40a6-9aa3-79ee933d9e4cn%40googlegroups.com.

Reply via email to