If I recall correctly, ara_number.traineddata has been trained for legacy
engine. You cannot use two traineddata files each using a different engine.

Regarding training of Arabic numbers and punctuation, it is currently an
open issue. If you use the latest code from tesstrain repo it should
automatically apply bidi algorithm to handle Arabic text as well as numbers
correctly. I am not so sure about punctuation such as ( ) etc and whether
they need to be reversed or not.

I suggest that you use the latest code from tesseract, tesstrain repo with
the latest traineddata and try.

On Sun, Jul 12, 2020, 20:52 Eliyaz L <write2eli...@gmail.com> wrote:

> Hi Shree,
>
> i was using thie below version. I guess you are right its 2016 file. Let
> me test with latest traineddata.
> https://tesseract-ocr.github.io/tessdoc/Data-Files
> https://github.com/tesseract-ocr/tessdata/raw/4.00/ara.traineddata
>
>
> Meanwhile can u pls help me with arabic number.
> i tried ara_number.traineddata from here
> <https://github.com/ahmed-tea/tessdata_Arabic_Numbers/blob/master/ara_number.traineddata>
>  it
> is working for number but unable to get date format with slash
> and also searched for similar issue here
> <https://github.com/tesseract-ocr/tesseract/issues/1193> here
> <https://github.com/Shreeshrii/tessdata_arabic>
>
> main problem is with date i am trying to do prediction Arabic date in the
> below format.
>
> Input image:
>
> [image: date.jpg]
>
>
>
>
> On Sunday, July 12, 2020 at 4:27:07 PM UTC+3, shree wrote:
>>
>> See https://github.com/tesseract-ocr/tesseract/issues/758 and other
>> similar issues
>>
>> On Sun, Jul 12, 2020 at 6:52 PM Shree Devi Kumar <shree...@gmail.com>
>> wrote:
>>
>>> @Eliyaz What version of tesseract are you using? Which traineddata?
>>>
>>> >Always the letter "لا" is predicted as "ال" .
>>>
>>> I think this was fixed by Ray Smiith in 2017 and should be ok in the
>>> traineddata files in tessdata_fast and tessdata_best repos.
>>>
>>> On Sun, Jul 12, 2020 at 6:45 PM Rainer Verteidiger <
>>> materialde...@gmail.com> wrote:
>>>
>>>>
>>>> Always the letter "لا" is predicted as "ال" .
>>>>
>>>> Not sure how much relevancy that bears in the context of training
>>>> models, but لا is no letter! It's a ligature ("Arabic Ligature Lam with
>>>> Alef") formed by combining ل ("Arabic Letter Lam") with ا ("Arabic Letter
>>>> Alef") whereas ال is ا followed by ل (so, the exact opposite way around; no
>>>> ligature). Both are incredibly common in Arabic texts and although I have
>>>> no clue about machine learning, I'm surprised how the training could miss
>>>> the difference between them.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesser...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/de95d94b-9dcd-432c-a06c-3180d6c741afo%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/de95d94b-9dcd-432c-a06c-3180d6c741afo%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/3a200939-7c85-48da-bb7b-6c55724bc116o%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/3a200939-7c85-48da-bb7b-6c55724bc116o%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUUrChtDvD3KTRjjVsmRVUKh2_cugmkrzUD1XiCWNLxvA%40mail.gmail.com.

Reply via email to