Many thanks. Downloaded and using.
Will wait for next ver.

On Sunday, July 1, 2018 at 12:21:19 AM UTC+5:30, shree wrote:
>
> I have uploaded a new version of traineddata file at 
>
> https://github.com/Shreeshrii/tessdata_shreetest/blob/master/iast-layer-18003.traineddata
>
> Attached is the OCRed output for pages 13-24 of dark pdf with it.
>
> I am still training a different variation.
>
>
>
> On Wed, Jun 27, 2018 at 6:46 PM Shree Devi Kumar <shree...@gmail.com 
> <javascript:>> wrote:
>
>> ok. I will take a look.
>>
>> On Wed, Jun 27, 2018 at 5:04 PM yajva <nsvnar...@gmail.com <javascript:>> 
>> wrote:
>>
>>> Checked with both light & dark pdfs. The results are very good. Thanks.
>>>
>>> A few concerns. E is consistently missed in both. J is missed 
>>> consistently in darker image but recognized as T in dark image. ṝ is 
>>> recognized as ṛ consistently. Can these be addressed ?
>>> I am using tesseract 4 alpha windows build from command line.
>>>
>>> Are the dev files in repos ?
>>>
>>>
>>> On Tuesday, June 26, 2018 at 11:06:06 PM UTC+5:30, shree wrote:
>>>>
>>>> I had used ghostview to convert PDF to tif or png.
>>>>
>>>> You can ocr PDF directly with gimagereader using the traineddata file I 
>>>> sent.
>>>>
>>>> See links for new windows binaries in msg below.
>>>>
>>>>
>>>> At last, here are some fresh builds:
>>>>
>>>>
>>>> https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_i686_tesseract4.git87635c1.exe
>>>>
>>>> https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_x86_64_tesseract4.git87635c1.exe
>>>>
>>>> I'd be also interested in testing of the tessdata manager, which should 
>>>> now also properly handle script tessdatas
>>>>
>>>> On Tue 26 Jun, 2018, 10:59 PM yajva, <nsvnar...@gmail.com> wrote:
>>>>
>>>>> The doc is diff ver of the same text. Here's the doc used for the 
>>>>> first. png. This is slightly darker, but the one sent earlier is cleaner. 
>>>>> Let me know which is more amenable for OCRing. I use PDF Shaper to 
>>>>> extract 
>>>>> images and convert to png using xnview.
>>>>>
>>>>> On Tuesday, June 26, 2018 at 7:48:28 PM UTC+5:30, shree wrote:
>>>>>>
>>>>>> Traineddata file is attached for use with tesseract4.0.0-beta.
>>>>>>
>>>>>> How did you create the test png from the pdf? I am not getting as 
>>>>>> good quality, tried various settings with irfanview.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 26, 2018 at 4:58 PM yajva <nsvnar...@gmail.com> wrote:
>>>>>>
>>>>>>> Sorry for the delay, my system was down.
>>>>>>>
>>>>>>> I am getting "Page not Found" for the link given. Can you pl 
>>>>>>> re-check?
>>>>>>>
>>>>>>> Here's the doc I am trying to OCR
>>>>>>>
>>>>>>>
>>>>>>> On Saturday, June 23, 2018 at 9:46:08 PM UTC+5:30, shree wrote:
>>>>>>>>
>>>>>>>> Please test with traineddata file from 
>>>>>>>> https://github.com/Shreeshrii/tessdata_sanskrit/tree/master/iast-plus1 
>>>>>>>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FShreeshrii%2Ftessdata_sanskrit%2Ftree%2Fmaster%2Fiast-plus1&sa=D&sntz=1&usg=AFQjCNHSTndmiJUoozyMRJ7OpHzTKIqYLw>
>>>>>>>>
>>>>>>>> Need to check that is it not overfitted.
>>>>>>>>
>>>>>>>> Please share a couple more images which I can use for testing.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 21, 2018 at 11:38 PM yajva <nsvnar...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> one more correction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thursday, June 21, 2018 at 11:34:00 PM UTC+5:30, yajva wrote:
>>>>>>>>>>
>>>>>>>>>> done
>>>>>>>>>>
>>>>>>>>>> On Wednesday, June 20, 2018 at 9:05:01 PM UTC+5:30, shree wrote:
>>>>>>>>>>>
>>>>>>>>>>> I am attaching the OCRed text. Please correct it so that  I can 
>>>>>>>>>>> use as groundtruth for further training and testing.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 20, 2018 at 3:15 PM Shree Devi Kumar <
>>>>>>>>>>> shree...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I had done a training for sanskrit for both devanagari and IAST 
>>>>>>>>>>>> but it does not include cedilla for Sh 
>>>>>>>>>>>>
>>>>>>>>>>>> I will add it and let you know.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed 20 Jun, 2018, 1:17 AM yajva, <nsvnar...@gmail.com> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I have tried Google OCR for recognizing Sanskrit text in Roman 
>>>>>>>>>>>>> with diacritics (IAST). It recognizes above macron but not dots 
>>>>>>>>>>>>> below also 
>>>>>>>>>>>>> joining grave and accent. Is there any traineddata available for 
>>>>>>>>>>>>> tesseract 
>>>>>>>>>>>>> that can do this with good accuracy ? Attached a sample page that 
>>>>>>>>>>>>> I am 
>>>>>>>>>>>>> interested in.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>>>>> To post to this group, send email to 
>>>>>>>>>>>>> tesser...@googlegroups.com.
>>>>>>>>>>>>> Visit this group at 
>>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr.
>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/aef0797b-8df3-4db7-9a3b-02f62d2e5a28%40googlegroups.com
>>>>>>>>>>>>>  
>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/aef0797b-8df3-4db7-9a3b-02f62d2e5a28%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>> .
>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>>
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a7bdf637-7f17-4eb3-8fa8-297018633bfa%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a7bdf637-7f17-4eb3-8fa8-297018633bfa%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>> ____________________________________________________________
>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/81b2b741-471c-45a5-adef-48330d960d62%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/81b2b741-471c-45a5-adef-48330d960d62%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/ed565236-146d-4902-b3e2-13445939a2f4%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/ed565236-146d-4902-b3e2-13445939a2f4%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/f942f9b9-a767-4d9e-9de7-0855179db9b5%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/f942f9b9-a767-4d9e-9de7-0855179db9b5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f1ea6ff9-ee4f-44b1-aa37-0433989a2adb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to