Many thanks. Downloaded and using. Will wait for next ver.
On Sunday, July 1, 2018 at 12:21:19 AM UTC+5:30, shree wrote: > > I have uploaded a new version of traineddata file at > > https://github.com/Shreeshrii/tessdata_shreetest/blob/master/iast-layer-18003.traineddata > > Attached is the OCRed output for pages 13-24 of dark pdf with it. > > I am still training a different variation. > > > > On Wed, Jun 27, 2018 at 6:46 PM Shree Devi Kumar <shree...@gmail.com > <javascript:>> wrote: > >> ok. I will take a look. >> >> On Wed, Jun 27, 2018 at 5:04 PM yajva <nsvnar...@gmail.com <javascript:>> >> wrote: >> >>> Checked with both light & dark pdfs. The results are very good. Thanks. >>> >>> A few concerns. E is consistently missed in both. J is missed >>> consistently in darker image but recognized as T in dark image. ṝ is >>> recognized as ṛ consistently. Can these be addressed ? >>> I am using tesseract 4 alpha windows build from command line. >>> >>> Are the dev files in repos ? >>> >>> >>> On Tuesday, June 26, 2018 at 11:06:06 PM UTC+5:30, shree wrote: >>>> >>>> I had used ghostview to convert PDF to tif or png. >>>> >>>> You can ocr PDF directly with gimagereader using the traineddata file I >>>> sent. >>>> >>>> See links for new windows binaries in msg below. >>>> >>>> >>>> At last, here are some fresh builds: >>>> >>>> >>>> https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_i686_tesseract4.git87635c1.exe >>>> >>>> https://smani.fedorapeople.org/tmp/gImageReader_3.2.99_qt5_x86_64_tesseract4.git87635c1.exe >>>> >>>> I'd be also interested in testing of the tessdata manager, which should >>>> now also properly handle script tessdatas >>>> >>>> On Tue 26 Jun, 2018, 10:59 PM yajva, <nsvnar...@gmail.com> wrote: >>>> >>>>> The doc is diff ver of the same text. Here's the doc used for the >>>>> first. png. This is slightly darker, but the one sent earlier is cleaner. >>>>> Let me know which is more amenable for OCRing. I use PDF Shaper to >>>>> extract >>>>> images and convert to png using xnview. >>>>> >>>>> On Tuesday, June 26, 2018 at 7:48:28 PM UTC+5:30, shree wrote: >>>>>> >>>>>> Traineddata file is attached for use with tesseract4.0.0-beta. >>>>>> >>>>>> How did you create the test png from the pdf? I am not getting as >>>>>> good quality, tried various settings with irfanview. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 26, 2018 at 4:58 PM yajva <nsvnar...@gmail.com> wrote: >>>>>> >>>>>>> Sorry for the delay, my system was down. >>>>>>> >>>>>>> I am getting "Page not Found" for the link given. Can you pl >>>>>>> re-check? >>>>>>> >>>>>>> Here's the doc I am trying to OCR >>>>>>> >>>>>>> >>>>>>> On Saturday, June 23, 2018 at 9:46:08 PM UTC+5:30, shree wrote: >>>>>>>> >>>>>>>> Please test with traineddata file from >>>>>>>> https://github.com/Shreeshrii/tessdata_sanskrit/tree/master/iast-plus1 >>>>>>>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FShreeshrii%2Ftessdata_sanskrit%2Ftree%2Fmaster%2Fiast-plus1&sa=D&sntz=1&usg=AFQjCNHSTndmiJUoozyMRJ7OpHzTKIqYLw> >>>>>>>> >>>>>>>> Need to check that is it not overfitted. >>>>>>>> >>>>>>>> Please share a couple more images which I can use for testing. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 21, 2018 at 11:38 PM yajva <nsvnar...@gmail.com> wrote: >>>>>>>> >>>>>>>>> one more correction. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thursday, June 21, 2018 at 11:34:00 PM UTC+5:30, yajva wrote: >>>>>>>>>> >>>>>>>>>> done >>>>>>>>>> >>>>>>>>>> On Wednesday, June 20, 2018 at 9:05:01 PM UTC+5:30, shree wrote: >>>>>>>>>>> >>>>>>>>>>> I am attaching the OCRed text. Please correct it so that I can >>>>>>>>>>> use as groundtruth for further training and testing. >>>>>>>>>>> >>>>>>>>>>> On Wed, Jun 20, 2018 at 3:15 PM Shree Devi Kumar < >>>>>>>>>>> shree...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I had done a training for sanskrit for both devanagari and IAST >>>>>>>>>>>> but it does not include cedilla for Sh >>>>>>>>>>>> >>>>>>>>>>>> I will add it and let you know. >>>>>>>>>>>> >>>>>>>>>>>> On Wed 20 Jun, 2018, 1:17 AM yajva, <nsvnar...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I have tried Google OCR for recognizing Sanskrit text in Roman >>>>>>>>>>>>> with diacritics (IAST). It recognizes above macron but not dots >>>>>>>>>>>>> below also >>>>>>>>>>>>> joining grave and accent. Is there any traineddata available for >>>>>>>>>>>>> tesseract >>>>>>>>>>>>> that can do this with good accuracy ? Attached a sample page that >>>>>>>>>>>>> I am >>>>>>>>>>>>> interested in. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to tesseract-oc...@googlegroups.com. >>>>>>>>>>>>> To post to this group, send email to >>>>>>>>>>>>> tesser...@googlegroups.com. >>>>>>>>>>>>> Visit this group at >>>>>>>>>>>>> https://groups.google.com/group/tesseract-ocr. >>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/aef0797b-8df3-4db7-9a3b-02f62d2e5a28%40googlegroups.com >>>>>>>>>>>>> >>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/aef0797b-8df3-4db7-9a3b-02f62d2e5a28%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>> . >>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/a7bdf637-7f17-4eb3-8fa8-297018633bfa%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/a7bdf637-7f17-4eb3-8fa8-297018633bfa%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/81b2b741-471c-45a5-adef-48330d960d62%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/81b2b741-471c-45a5-adef-48330d960d62%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/ed565236-146d-4902-b3e2-13445939a2f4%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/ed565236-146d-4902-b3e2-13445939a2f4%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>> To post to this group, send email to tesser...@googlegroups.com >>> <javascript:>. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/f942f9b9-a767-4d9e-9de7-0855179db9b5%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/f942f9b9-a767-4d9e-9de7-0855179db9b5%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f1ea6ff9-ee4f-44b1-aa37-0433989a2adb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.