Let me give you some general pointers:

   - If something does not work the way you expect it to, that does not
   mean it's broken ;-). Maybe you just misunderstood something. Or you expect
   something that was never promised...
   - you mark something as broken, you should first prove that it worked
   and now it does not work anymore. Or you should understand the feature in
   detail and explain why it does not work ;-)

About the user-patterns:

   1. user-patterns "just" extend dawg dictionary [1],  [2]. So using
   user-patterns does not mean "interpret OCR string as this pattern" or "find
   this pattern in the image"
   2. Some years ago (in the age of tesseract 3 a.k.a. legacy engine)
   someone measured the influence of dictionaries on OCR results by 10-15%. It
   would be great if somebody would make such a test for the LSTM engine ;-).
   But I would not expect a big change and definitely not a 100% result when
   adding a word/pattern to the dictionary.

[1]
https://github.com/tesseract-ocr/tesseract/blob/5a36943de4a39d236a9762f6971823c5b7c20404/src/dict/dict.cpp#L263-L279
[2]
https://github.com/tesseract-ocr/tesseract/blob/5a36943de4a39d236a9762f6971823c5b7c20404/src/dict/dict.cpp#L336-L352

Zdenko


po 15. 8. 2022 o 6:17 Benjamin Hall <codenamejupit...@gmail.com> napĂ­sal(a):

> *I  also encountered the same issue with release 5.2*
> Did you ever find the reason why?
>
> *The pattern worked two or three times with similar images but now it
> doesn't work anymore for some reason.*
> *Does anyone know why it broke ?*
> Just learning Tesseract now and have not experimented with regex
> parameters yet....But if I find a solution I will get it over to you.
>
> On Sun, Aug 14, 2022 at 8:14 PM 'Yunlong Liu' via tesseract-ocr <
> tesseract-ocr@googlegroups.com> wrote:
>
>> I  also encountered the same issue with release 5.2
>>
>> On Monday, August 15, 2022 at 1:55:49 AM UTC+8 louisd...@gmail.com wrote:
>>
>>> I'm using pytesseract with tesseract 5.2.0.20220712 to try to read a
>>> float number from an image.
>>> Here is the image I'm trying to read :
>>> [image: help.png]
>>> When using tesseract with the config "'--psm 7 --user-patterns
>>> "D:\PyCharmProjects\SpiralBattle\patterns.txt"'", it returns "4.43003"
>>> instead of the expected "4.43e03"
>>> My patterns.txt file is the following :
>>> \d.\d\de\d\d
>>>
>>> The pattern worked two or three times with similar images but now it
>>> doesn't work anymore for some reason.
>>>
>>> Does anyone know why it broke ?
>>>
>>> Thanks in advance.
>>>
>>
>> This email and any attachment(s) it may contain is confidential and is
>> intended solely for the use of the individual(s) to whom it is addressed.
>> If you are not the intended recipient of this email, you must not take
>> action based on the contents, nor distribute, nor expose any part of the
>> content(s) to entities or person(s) beyond the original distribution list.
>> Please contact the sender and delete the email if you have received it in
>> error. Thank you.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/ec009a55-810d-4de0-886d-a6d50fa6c22en%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/ec009a55-810d-4de0-886d-a6d50fa6c22en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAMrq%3DiQExnG7myJsq%3DRnUcSa3TO8u5Ye%3DbW1yQuWXJn_pyVnxQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAMrq%3DiQExnG7myJsq%3DRnUcSa3TO8u5Ye%3DbW1yQuWXJn_pyVnxQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xze0_gGcLbrVGVHojnHK2iq1c7z-KwNeL%3D1a5oZREaYQ%40mail.gmail.com.

Reply via email to