Let me give you some general pointers: - If something does not work the way you expect it to, that does not mean it's broken ;-). Maybe you just misunderstood something. Or you expect something that was never promised... - you mark something as broken, you should first prove that it worked and now it does not work anymore. Or you should understand the feature in detail and explain why it does not work ;-)
About the user-patterns: 1. user-patterns "just" extend dawg dictionary [1], [2]. So using user-patterns does not mean "interpret OCR string as this pattern" or "find this pattern in the image" 2. Some years ago (in the age of tesseract 3 a.k.a. legacy engine) someone measured the influence of dictionaries on OCR results by 10-15%. It would be great if somebody would make such a test for the LSTM engine ;-). But I would not expect a big change and definitely not a 100% result when adding a word/pattern to the dictionary. [1] https://github.com/tesseract-ocr/tesseract/blob/5a36943de4a39d236a9762f6971823c5b7c20404/src/dict/dict.cpp#L263-L279 [2] https://github.com/tesseract-ocr/tesseract/blob/5a36943de4a39d236a9762f6971823c5b7c20404/src/dict/dict.cpp#L336-L352 Zdenko po 15. 8. 2022 o 6:17 Benjamin Hall <codenamejupit...@gmail.com> napĂsal(a): > *I also encountered the same issue with release 5.2* > Did you ever find the reason why? > > *The pattern worked two or three times with similar images but now it > doesn't work anymore for some reason.* > *Does anyone know why it broke ?* > Just learning Tesseract now and have not experimented with regex > parameters yet....But if I find a solution I will get it over to you. > > On Sun, Aug 14, 2022 at 8:14 PM 'Yunlong Liu' via tesseract-ocr < > tesseract-ocr@googlegroups.com> wrote: > >> I also encountered the same issue with release 5.2 >> >> On Monday, August 15, 2022 at 1:55:49 AM UTC+8 louisd...@gmail.com wrote: >> >>> I'm using pytesseract with tesseract 5.2.0.20220712 to try to read a >>> float number from an image. >>> Here is the image I'm trying to read : >>> [image: help.png] >>> When using tesseract with the config "'--psm 7 --user-patterns >>> "D:\PyCharmProjects\SpiralBattle\patterns.txt"'", it returns "4.43003" >>> instead of the expected "4.43e03" >>> My patterns.txt file is the following : >>> \d.\d\de\d\d >>> >>> The pattern worked two or three times with similar images but now it >>> doesn't work anymore for some reason. >>> >>> Does anyone know why it broke ? >>> >>> Thanks in advance. >>> >> >> This email and any attachment(s) it may contain is confidential and is >> intended solely for the use of the individual(s) to whom it is addressed. >> If you are not the intended recipient of this email, you must not take >> action based on the contents, nor distribute, nor expose any part of the >> content(s) to entities or person(s) beyond the original distribution list. >> Please contact the sender and delete the email if you have received it in >> error. Thank you. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/ec009a55-810d-4de0-886d-a6d50fa6c22en%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/ec009a55-810d-4de0-886d-a6d50fa6c22en%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAMrq%3DiQExnG7myJsq%3DRnUcSa3TO8u5Ye%3DbW1yQuWXJn_pyVnxQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAMrq%3DiQExnG7myJsq%3DRnUcSa3TO8u5Ye%3DbW1yQuWXJn_pyVnxQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xze0_gGcLbrVGVHojnHK2iq1c7z-KwNeL%3D1a5oZREaYQ%40mail.gmail.com.