Re: Traineddata for Latin-Indic

Shree Devi Kumar Mon, 02 Sep 2013 23:56:14 -0700

I updated tesseract to the latest version in svn and now I am getting
errors while running training ..



D:\BuildFolder\testing\TRAINdata\v6-TransliterationOnly>echo off
tesseract 3.02.03
 leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit]
  libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5

**** extracting unicharset *****
Extracting unicharset from ipa.sanskrit2003.exp994.box
Wrote unicharset file ./unicharset.
**** done extracting unicharset from *****
****   ipa.sanskrit2003.exp994.box ****
**** Training using following  .tr files *****
****   ipa.sanskrit2003.exp994.tr ****
****  NO Shapeclustering - Non Indic Language*****
**** Started MFTraining *****
Read shape table shapetable of 733 shapes
Reading ipa.sanskrit2003.exp994.tr ...

id < this->size():Error:Assert failed:in file ..\..\ccutil\unicharset.cpp,
line
237

Has anyone else had this problem?


Additionally, for sanskrit language data
I am errors while running OCR on .png images - it worked fine earlier.

        1 file(s) copied.
tesseract 3.02.03
 leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit]
  libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5

processing san.0s2003.exp0.tif
processing san.0s2003.exp8.tif
processing san.0sanskrit2003.exp0.tif
processing san.0sanskrit2003.exp8.tif
processing san.mnt.exp013.png
TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089).
processing san.mnt.exp014.png
TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089).
processing san.mnt.exp031.png
TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089).
processing san.mnt.exp032.png
TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089).
processing san.mnt.exp038.png
TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089).
processing san.mnt.exp424.png
TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089).
Press any key to continue . . .


Should I open issues for the above?







Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Thu, Aug 22, 2013 at 3:10 PM, Shree <shreesh...@gmail.com> wrote:

> I had started training Tessearct for recognizing texts which have Indic
> transliteration - please see
> http://www.unicode.org/cldr/charts/transforms/Latin-Indic.html for the
> diacritics used for the same.
>
> After Ray's post regarding upcoming merge and next release, I am holding
> off on further training.
>
> However, I wanted to check whether this is already available as part of
> another language data. I am attaching a sample image, text file as well as
> the unicharset for reference.
>
> Thanks,
> Shree
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-dev" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-dev/bRD21wf3GxQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Traineddata for Latin-Indic

Reply via email to