As far as I understand, the font limitation applies up to tesseract 3.02. Major changes to training are currently in the works in SVN for 3.03 (not fully released yet - hence you see large number of fonts for english traineddata but not for others). The other languages traineddata maybe forthcoming in future.
Ray/Zdenko/Nick may be able to give an idea of expected timeline for release. Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 8, 2014 at 5:04 PM, Paul <p...@vorb.de> wrote: > If you have a look at intproto.h, you'll see there is a similar > limitation, bit it's much more complicated. Unfortunately I don't have an > overview of what is possible yet, but I'm working on it. :) Just use > normproto.h as a reference. > > Am Dienstag, 8. Juli 2014 02:55:37 UTC+2 schrieb Albrecht Hilker: > >> The manual "Training Tesseract 3" says: >> >> > Tesseract needs to know about different shapes of the same character by >> having different fonts separated explicitly. >> > This used to be limited to 32 fonts, but the limit has been raised to >> 64. >> > It is set by the constant MAX_NUM_CONFIGS defined in intproto.h. >> > Note that runtime is heavily dependent on the number of fonts provided, >> and training more than 32 will result in a significant slow-down. >> >> >> >> I analyzed the number of fonts in eng.traineddata and I was very >> surprised that there have been 358 fonts trained ! >> get_fontinfo_table().size() returns 358 ! >> >> >> Can anybody explain me this contradiction ? >> >> >> >> >> Fonts in eng.traineddata: >> >> AR_PL_UKai_CN, >> AR_PL_UKai_Patched, >> AR_PL_UKai_TW, >> AR_PL_UMing_CN_Light, >> AR_PL_UMing_Patched_Light, >> AR_PL_UMing_TW_MBE_Light, >> Aboriginal_Sans, >> Aboriginal_Sans_Bold_Italic, >> Aboriginal_Sans_Italic, >> Aboriginal_Serif, >> Aboriginal_Serif_Bold, >> Aboriginal_Serif_Bold_Italic, >> Aboriginal_Serif_Italic, >> Abyssinica_SIL, >> AlArabiya, >> AlBattar, >> AlHor, >> AlManzomah, >> AlMohanad, >> Andale_Mono, >> Ani, >> AnjaliOldLipi, >> Arab, >> Arial, >> Arial_Black, >> Arial_Bold, >> Arial_Bold_Italic, >> Arial_Italic, >> BPG_Chveulebrivi, >> BPG_Chveulebrivi_Bold, >> BPG_Courier, >> BPG_Courier_Bold, >> BPG_Elite, >> BPG_Elite_Bold, >> BPG_Glaho, >> BPG_Glaho_Bold, >> BPG_Rioni, >> BPG_Rioni_Bold, >> BPG_Unicode_Standard, >> Baekmuk_Batang, >> Baekmuk_Batang_Patched, >> Baekmuk_Dotum, >> Baekmuk_Gulim, >> Baekmuk_Headline, >> Bangla, >> Bitstream_Vera_Sans, >> Bitstream_Vera_Sans_Bold, >> Bitstream_Vera_Sans_Bold_Oblique, >> Bitstream_Vera_Sans_Mono, >> Bitstream_Vera_Sans_Mono_Bold, >> Bitstream_Vera_Sans_Mono_Bold_Oblique, >> Bitstream_Vera_Sans_Mono_Oblique, >> Bitstream_Vera_Sans_Mono_Roman, >> Bitstream_Vera_Sans_Oblique, >> Bitstream_Vera_Sans_Roman, >> Bitstream_Vera_Serif, >> Bitstream_Vera_Serif_Bold, >> Bitstream_Vera_Serif_Roman, >> CaslonishFraxx, >> Century_Schoolbook_L, >> Century_Schoolbook_L_Bold, >> Century_Schoolbook_L_Bold_Italic, >> Century_Schoolbook_L_Italic, >> Century_Schoolbook_L_Roman, >> Chandas, >> Cloister_Black_Light, >> Comic_Sans_MS, >> Comic_Sans_MS_Bold, >> Cortoba, >> Courier_New, >> Courier_New_Bold, >> Courier_New_Bold_Italic, >> Courier_New_Italic, >> DejaVu_Sans, >> DejaVu_Sans_Bold, >> DejaVu_Sans_Bold_Oblique, >> DejaVu_Sans_Condensed, >> DejaVu_Sans_Condensed_Bold, >> DejaVu_Sans_Condensed_Bold_Oblique, >> DejaVu_Sans_Condensed_Oblique, >> DejaVu_Sans_Mono, >> DejaVu_Sans_Mono_Bold, >> DejaVu_Sans_Mono_Bold_Oblique, >> DejaVu_Sans_Mono_Oblique, >> DejaVu_Sans_Oblique, >> DejaVu_Sans_Ultra-Light, >> DejaVu_Serif, >> DejaVu_Serif_Bold, >> DejaVu_Serif_Bold_Italic, >> DejaVu_Serif_Bold_Oblique, >> DejaVu_Serif_Bold_Semi-Condensed, >> DejaVu_Serif_Condensed_Bold, >> DejaVu_Serif_Condensed_Bold_Italic, >> DejaVu_Serif_Condensed_Italic, >> DejaVu_Serif_Italic, >> DejaVu_Serif_Oblique, >> DejaVu_Serif_Semi-Condensed, >> Dimnah, >> Dustismo, >> Dustismo_Roman, >> Dustismo_Roman_Bold, >> Dustismo_Roman_Italic, >> Dustismo_Roman_Italic_Bold, >> Dyuthi, >> East_Syriac_Adiabene, >> East_Syriac_Ctesiphon, >> Electron, >> Estrangelo_Antioch, >> Estrangelo_Edessa, >> Estrangelo_Midyat, >> Estrangelo_Nisibin, >> Estrangelo_Quenneshrin, >> Estrangelo_Talada, >> Estrangelo_TurAbdin, >> FreeMono, >> FreeMono_Bold, >> FreeMono_Bold_Italic, >> FreeMono_Bold_Oblique, >> FreeMono_Italic, >> FreeMono_Oblique, >> FreeSans, >> FreeSans_Bold, >> FreeSans_Bold_Oblique, >> FreeSans_Oblique, >> FreeSerif, >> FreeSerif_Bold, >> FreeSerif_Bold_Italic, >> FreeSerif_Italic, >> Furat, >> Garuda, >> Garuda_Bold, >> Garuda_Bold_Oblique, >> Garuda_Oblique, >> GentiumAlt, >> GentiumAlt_Italic, >> Georgia, >> Georgia_Bold, >> Georgia_Bold_Italic, >> Georgia_Italic, >> Granada, >> Graph, >> Hani, >> Haramain, >> Hor, >> IPAGothic, >> IPAMincho, >> IPAPGothic, >> IPAPMincho, >> IPAUIGothic, >> Impact, >> Impact_Condensed, >> Jamrul, >> Jamrul_Semi-Expanded, >> Japan, >> Jet, >> Kalimati, >> Kalyani, >> Kayrawan, >> Kedage, >> Kedage_Bold, >> Kedage_Bold_Italic, >> Kedage_Italic, >> Khalid, >> Khmer_OS, >> Khmer_OS_Battambang, >> Khmer_OS_Bokor, >> Khmer_OS_Content, >> Khmer_OS_Fasthand, >> Khmer_OS_Freehand, >> Khmer_OS_Metal_Chrieng, >> Khmer_OS_Muol, >> Khmer_OS_Muol_Light, >> Khmer_OS_Muol_Pali, >> Khmer_OS_Siemreap, >> Khmer_OS_System, >> Kochi_Gothic, >> Kochi_Mincho, >> LKLUG, >> Lateef, >> Likhan, >> Linux_Biolinum_O, >> Linux_Biolinum_O_Bold, >> Linux_Libertine_O, >> Linux_Libertine_O_Bold, >> Linux_Libertine_O_Bold_Italic, >> Linux_Libertine_O_C, >> Linux_Libertine_O_Italic, >> Lohit_Assamese, >> Lohit_Bengali, >> Lohit_Gujarati, >> Lohit_Hindi, >> Lohit_Malayalam, >> Lohit_Oriya, >> Lohit_Punjabi, >> Lohit_Tamil, >> Lohit_Telugu, >> Loma, >> Loma_Bold, >> Loma_Bold_Oblique, >> Loma_Oblique, >> Lucida_Bright, >> Lucida_Bright_Italic, >> Lucida_Bright_Semi-Bold, >> Lucida_Bright_Semi-Bold_Italic, >> Lucida_Sans, >> Lucida_Sans_Oblique, >> Lucida_Sans_Semi-Bold, >> Lucida_Sans_Semi-Bold_Oblique, >> Lucida_Sans_Typewriter, >> Lucida_Sans_Typewriter_Bold, >> Lucida_Sans_Typewriter_Bold_Oblique, >> Mallige, >> Mallige_Bold, >> Mallige_Bold_Italic, >> Mallige_Italic, >> Mashq, >> Meera, >> Metal, >> Mitra_Mono, >> Monapo, >> Mukti_Narrow, >> Mukti_Narrow_Bold, >> Nada, >> Nagham, >> Nice, >> Norasi, >> Norasi_Bold, >> Norasi_Bold_Italic, >> Norasi_Bold_Oblique, >> Norasi_Italic, >> Norasi_Oblique, >> OpenSymbol, >> Ostorah, >> Padauk, >> Padauk_Bold, >> Petra, >> Phetsarath_OT, >> Pothana2000, >> Proclamate_Light, >> Purisa_Light, >> Rachana, >> Rachana_w01, >> RaghuMalayalam, >> Rehan, >> Rekha, >> Saab, >> Salem, >> Samanata, >> Samyak_Gujarati, >> Samyak_Oriya, >> Sazanami_Gothic, >> Sazanami_Mincho, >> Scheherazade, >> Serto_Batnan, >> Serto_Batnan_Bold, >> Serto_Jerusalem, >> Serto_Jerusalem_Bold, >> Serto_Jerusalem_Italic, >> Serto_Kharput, >> Serto_Malankara, >> Serto_Mardin, >> Serto_Mardin_Bold, >> Serto_Urhoy, >> Serto_Urhoy_Bold, >> Shado, >> Sharjah, >> TAMu_Kadambri, >> TAMu_Kalyani, >> TAMu_Maduram, >> TSCu_Comic, >> TSCu_Paranar, >> TSCu_Paranar_Bold, >> TSCu_Paranar_Italic, >> TSCu_Times, >> TakaoExGothic, >> TakaoExMincho, >> TakaoGothic, >> TakaoMincho, >> TakaoPGothic, >> TakaoPMincho, >> Tarablus, >> Tholoth, >> Tibetan_Machine_Uni, >> Times_New_Roman, >> Times_New_Roman_Bold, >> Times_New_Roman_Bold_Italic, >> Times_New_Roman_Italic, >> TlwgMono, >> TlwgMono_Bold, >> TlwgMono_Bold_Oblique, >> TlwgMono_Oblique, >> TlwgTypewriter, >> TlwgTypewriter_Bold, >> TlwgTypewriter_Bold_Oblique, >> TlwgTypewriter_Oblique, >> Trebuchet_MS, >> Trebuchet_MS_Bold, >> Trebuchet_MS_Bold_Italic, >> Trebuchet_MS_Italic, >> URW_Bookman_L, >> URW_Bookman_L_Bold, >> URW_Bookman_L_Bold_Italic, >> URW_Bookman_L_Italic, >> URW_Bookman_L_Light_Italic, >> UmePlus_Gothic, >> UmePlus_P_Gothic, >> UnBatang, >> UnBatang_Bold, >> UnDotum, >> UnDotum_Bold, >> UnifrakturMaguntia, >> Unikurd_Web, >> Uttara, >> VL_Gothic, >> VL_PGothic, >> Vemana2000, >> Verdana, >> Verdana_Bold, >> Verdana_Bold_Italic, >> Verdana_Italic, >> Walbaum-Fraktur, >> Webdings, >> WenQuanYi_Zen_Hei, >> Wyld, >> Wyld_Italic, >> aakar, >> batang, >> chandas1-1, >> chandas1-2, >> cheluvi, >> dotum, >> gargi, >> gulim, >> hline, >> ipag, >> ipagp, >> ipagui, >> ipam, >> ipamp, >> kalimati, >> kochi-gothic, >> kochi-gothic-subst, >> kochi-mincho, >> kochi-mincho-subst, >> lklug, >> lohit_bn, >> lohit_gu, >> lohit_hi, >> lohit_ml, >> lohit_or, >> lohit_pa, >> lohit_ta, >> lohit_te, >> monapo, >> ori1Uni, >> padmaa, >> padmaa_Bold, >> suruma >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/bee86d37-9e63-4d76-be78-345b8ed7f931%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/bee86d37-9e63-4d76-be78-345b8ed7f931%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVU_uo%3DJq3OMcmW13ewOYc7AtfT6jMtBV7EKhn299wAwQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.