I updated tesseract to the latest version in svn and now I am getting errors while running training ..
D:\BuildFolder\testing\TRAINdata\v6-TransliterationOnly>echo off tesseract 3.02.03 leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit] libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5 **** extracting unicharset ***** Extracting unicharset from ipa.sanskrit2003.exp994.box Wrote unicharset file ./unicharset. **** done extracting unicharset from ***** **** ipa.sanskrit2003.exp994.box **** **** Training using following .tr files ***** **** ipa.sanskrit2003.exp994.tr **** **** NO Shapeclustering - Non Indic Language***** **** Started MFTraining ***** Read shape table shapetable of 733 shapes Reading ipa.sanskrit2003.exp994.tr ... id < this->size():Error:Assert failed:in file ..\..\ccutil\unicharset.cpp, line 237 Has anyone else had this problem? Additionally, for sanskrit language data I am errors while running OCR on .png images - it worked fine earlier. 1 file(s) copied. tesseract 3.02.03 leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit] libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5 processing san.0s2003.exp0.tif processing san.0s2003.exp8.tif processing san.0sanskrit2003.exp0.tif processing san.0sanskrit2003.exp8.tif processing san.mnt.exp013.png TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). processing san.mnt.exp014.png TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). processing san.mnt.exp031.png TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). processing san.mnt.exp032.png TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). processing san.mnt.exp038.png TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). processing san.mnt.exp424.png TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). Press any key to continue . . . Should I open issues for the above? Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Aug 22, 2013 at 3:10 PM, Shree <shreesh...@gmail.com> wrote: > I had started training Tessearct for recognizing texts which have Indic > transliteration - please see > http://www.unicode.org/cldr/charts/transforms/Latin-Indic.html for the > diacritics used for the same. > > After Ray's post regarding upcoming merge and next release, I am holding > off on further training. > > However, I wanted to check whether this is already available as part of > another language data. I am attaching a sample image, text file as well as > the unicharset for reference. > > Thanks, > Shree > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-dev" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-dev/bRD21wf3GxQ/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > tesseract-dev+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.