If you can, try pre-processing and inverting the image so it is black text on a white background. I found that recognition works much better with the preprocessing (probably since the models were trained with that kind of input)
On Tuesday, July 30, 2024 at 10:45:56 PM UTC+8 allelu...@gmail.com wrote: > I'm trying to use a tesseract in project wrote in C#. I have a problem > with reading text from a part of an image. I'm trying to find this 4 signs > (in example 0000) and number after "e". Additionally, for some examples it > is working perfectly but for some others its printing "Empty page!!!". > Difference between examples is color of the background but whole image > processing is the same for every try. What should I do to minimize > probability of error? > > > Thats the image where ocr is working correctly: > [image: working.jpg] > > and here is not working: > > [image: not working.jpg] > > > > Part of code in c#: > > > public static class Sign > { > public static void Verify() > { > string imagePath = "path.bmp"; > Mat imageSign = new Mat(imagePath); > > int h = imageSign.Rows; > int w = imageSign.Cols; > int point1 = (int)(0.01 * w); > int point2 = (int)(0.6 * h); > int point3 = (int)(0.3 * w); > int point4 = (int)(0.9 * h); > OpenCvSharp.Point start_point = new OpenCvSharp.Point(point1, > point2); > OpenCvSharp.Point end_point = new OpenCvSharp.Point(point3, > point4); > imageSign = new Mat(imageSign, new OpenCvSharp.Rect(point1, > point2, point3 - point1, point4 - point2)); > Cv2.Resize(imageSign, imageSign, new OpenCvSharp.Size(), 2, 2); > imageSign.SaveImage(imagePath); > > using (Bitmap bitmap = (Bitmap)Image.FromFile(imagePathE)) > { > using (Bitmap newBitmap = new Bitmap(bitmap)) > { > string imagePathA = "2nd image path.bmp"; > newBitmap.SetResolution(300, 300); > newBitmap.Save(imagePathA); > } > } > > > > > string imagePathB = " "2nd image path.bmp " ; > var pixFromFile = Pix.LoadFromFile(imagePathB); > string customConfig = "--psm 10 --oem 3"; > using (var engine = new TesseractEngine(@"C:\Program > Files\Tesseract-OCR\tessdata", "eng", EngineMode.Default)) > { > > engined.SetVariable("tessedit_char_whitelist", "0123456789"); > using (var page = engined.Process(pixFromFile, customConfig)) > { > string text = page.GetText(); > Console.Write(text); > > string[] lines = text.Split('\n'); > bool linijka = false; > > foreach (string line in lines) > { > if (line.Length == 4 || line.Length == 5) > { > Console.WriteLine("Oznaczenie e5: "); > Console.WriteLine(line); > linijka = true; > } > if (line.Length == 1) > { > Console.WriteLine("e_:"); > Console.WriteLine(line); > } > } > > > Cv2.ImShow("koniec", imageSign); > Cv2.WaitKey(0); > } > } > > I tried cropping an image and for some reason when i making it bigger or > smaller than it is now, it adversely affects on results. Additionally I > tried some other tesseract psm configurations and changed dpi of image to > 300. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/442e78c7-2432-40a6-9aa3-79ee933d9e4cn%40googlegroups.com.