As far as I understand, the font limitation applies up to tesseract 3.02.

Major changes to training are currently in the works in SVN for 3.03 (not
fully released yet - hence you see large number of fonts for english
traineddata but not for others). The other languages traineddata maybe
forthcoming in future.

Ray/Zdenko/Nick may be able to give an idea of expected timeline for

Shree Devi Kumar
भजन - कीर्तन - आरती @

On Tue, Jul 8, 2014 at 5:04 PM, Paul <> wrote:

> If you have a look at intproto.h, you'll see there is a similar
> limitation, bit it's much more complicated. Unfortunately I don't have an
> overview of what is possible yet, but I'm working on it. :) Just use
> normproto.h as a reference.
> Am Dienstag, 8. Juli 2014 02:55:37 UTC+2 schrieb Albrecht Hilker:
>> The manual "Training Tesseract 3" says:
>> > Tesseract needs to know about different shapes of the same character by
>> having different fonts separated explicitly.
>> > This used to be limited to 32 fonts, but the limit has been raised to
>> 64.
>> > It is set by the constant MAX_NUM_CONFIGS defined in intproto.h.
>> > Note that runtime is heavily dependent on the number of fonts provided,
>> and training more than 32 will result in a significant slow-down.
>> I analyzed the number of fonts in eng.traineddata and I was very
>> surprised that there have been 358 fonts trained !
>> get_fontinfo_table().size() returns 358 !
>> Can anybody explain me this contradiction ?
>> Fonts in eng.traineddata:
>>  AR_PL_UKai_CN,
>>  AR_PL_UKai_Patched,
>>  AR_PL_UKai_TW,
>>  AR_PL_UMing_CN_Light,
>>  AR_PL_UMing_Patched_Light,
>>  AR_PL_UMing_TW_MBE_Light,
>>  Aboriginal_Sans,
>>  Aboriginal_Sans_Bold_Italic,
>>  Aboriginal_Sans_Italic,
>>  Aboriginal_Serif,
>>  Aboriginal_Serif_Bold,
>>  Aboriginal_Serif_Bold_Italic,
>>  Aboriginal_Serif_Italic,
>>  Abyssinica_SIL,
>>  AlArabiya,
>>  AlBattar,
>>  AlHor,
>>  AlManzomah,
>>  AlMohanad,
>>  Andale_Mono,
>>  Ani,
>>  AnjaliOldLipi,
>>  Arab,
>>  Arial,
>>  Arial_Black,
>>  Arial_Bold,
>>  Arial_Bold_Italic,
>>  Arial_Italic,
>>  BPG_Chveulebrivi,
>>  BPG_Chveulebrivi_Bold,
>>  BPG_Courier,
>>  BPG_Courier_Bold,
>>  BPG_Elite,
>>  BPG_Elite_Bold,
>>  BPG_Glaho,
>>  BPG_Glaho_Bold,
>>  BPG_Rioni,
>>  BPG_Rioni_Bold,
>>  BPG_Unicode_Standard,
>>  Baekmuk_Batang,
>>  Baekmuk_Batang_Patched,
>>  Baekmuk_Dotum,
>>  Baekmuk_Gulim,
>>  Baekmuk_Headline,
>>  Bangla,
>>  Bitstream_Vera_Sans,
>>  Bitstream_Vera_Sans_Bold,
>>  Bitstream_Vera_Sans_Bold_Oblique,
>>  Bitstream_Vera_Sans_Mono,
>>  Bitstream_Vera_Sans_Mono_Bold,
>>  Bitstream_Vera_Sans_Mono_Bold_Oblique,
>>  Bitstream_Vera_Sans_Mono_Oblique,
>>  Bitstream_Vera_Sans_Mono_Roman,
>>  Bitstream_Vera_Sans_Oblique,
>>  Bitstream_Vera_Sans_Roman,
>>  Bitstream_Vera_Serif,
>>  Bitstream_Vera_Serif_Bold,
>>  Bitstream_Vera_Serif_Roman,
>>  CaslonishFraxx,
>>  Century_Schoolbook_L,
>>  Century_Schoolbook_L_Bold,
>>  Century_Schoolbook_L_Bold_Italic,
>>  Century_Schoolbook_L_Italic,
>>  Century_Schoolbook_L_Roman,
>>  Chandas,
>>  Cloister_Black_Light,
>>  Comic_Sans_MS,
>>  Comic_Sans_MS_Bold,
>>  Cortoba,
>>  Courier_New,
>>  Courier_New_Bold,
>>  Courier_New_Bold_Italic,
>>  Courier_New_Italic,
>>  DejaVu_Sans,
>>  DejaVu_Sans_Bold,
>>  DejaVu_Sans_Bold_Oblique,
>>  DejaVu_Sans_Condensed,
>>  DejaVu_Sans_Condensed_Bold,
>>  DejaVu_Sans_Condensed_Bold_Oblique,
>>  DejaVu_Sans_Condensed_Oblique,
>>  DejaVu_Sans_Mono,
>>  DejaVu_Sans_Mono_Bold,
>>  DejaVu_Sans_Mono_Bold_Oblique,
>>  DejaVu_Sans_Mono_Oblique,
>>  DejaVu_Sans_Oblique,
>>  DejaVu_Sans_Ultra-Light,
>>  DejaVu_Serif,
>>  DejaVu_Serif_Bold,
>>  DejaVu_Serif_Bold_Italic,
>>  DejaVu_Serif_Bold_Oblique,
>>  DejaVu_Serif_Bold_Semi-Condensed,
>>  DejaVu_Serif_Condensed_Bold,
>>  DejaVu_Serif_Condensed_Bold_Italic,
>>  DejaVu_Serif_Condensed_Italic,
>>  DejaVu_Serif_Italic,
>>  DejaVu_Serif_Oblique,
>>  DejaVu_Serif_Semi-Condensed,
>>  Dimnah,
>>  Dustismo,
>>  Dustismo_Roman,
>>  Dustismo_Roman_Bold,
>>  Dustismo_Roman_Italic,
>>  Dustismo_Roman_Italic_Bold,
>>  Dyuthi,
>>  East_Syriac_Adiabene,
>>  East_Syriac_Ctesiphon,
>>  Electron,
>>  Estrangelo_Antioch,
>>  Estrangelo_Edessa,
>>  Estrangelo_Midyat,
>>  Estrangelo_Nisibin,
>>  Estrangelo_Quenneshrin,
>>  Estrangelo_Talada,
>>  Estrangelo_TurAbdin,
>>  FreeMono,
>>  FreeMono_Bold,
>>  FreeMono_Bold_Italic,
>>  FreeMono_Bold_Oblique,
>>  FreeMono_Italic,
>>  FreeMono_Oblique,
>>  FreeSans,
>>  FreeSans_Bold,
>>  FreeSans_Bold_Oblique,
>>  FreeSans_Oblique,
>>  FreeSerif,
>>  FreeSerif_Bold,
>>  FreeSerif_Bold_Italic,
>>  FreeSerif_Italic,
>>  Furat,
>>  Garuda,
>>  Garuda_Bold,
>>  Garuda_Bold_Oblique,
>>  Garuda_Oblique,
>>  GentiumAlt,
>>  GentiumAlt_Italic,
>>  Georgia,
>>  Georgia_Bold,
>>  Georgia_Bold_Italic,
>>  Georgia_Italic,
>>  Granada,
>>  Graph,
>>  Hani,
>>  Haramain,
>>  Hor,
>>  IPAGothic,
>>  IPAMincho,
>>  IPAPGothic,
>>  IPAPMincho,
>>  IPAUIGothic,
>>  Impact,
>>  Impact_Condensed,
>>  Jamrul,
>>  Jamrul_Semi-Expanded,
>>  Japan,
>>  Jet,
>>  Kalimati,
>>  Kalyani,
>>  Kayrawan,
>>  Kedage,
>>  Kedage_Bold,
>>  Kedage_Bold_Italic,
>>  Kedage_Italic,
>>  Khalid,
>>  Khmer_OS,
>>  Khmer_OS_Battambang,
>>  Khmer_OS_Bokor,
>>  Khmer_OS_Content,
>>  Khmer_OS_Fasthand,
>>  Khmer_OS_Freehand,
>>  Khmer_OS_Metal_Chrieng,
>>  Khmer_OS_Muol,
>>  Khmer_OS_Muol_Light,
>>  Khmer_OS_Muol_Pali,
>>  Khmer_OS_Siemreap,
>>  Khmer_OS_System,
>>  Kochi_Gothic,
>>  Kochi_Mincho,
>>  LKLUG,
>>  Lateef,
>>  Likhan,
>>  Linux_Biolinum_O,
>>  Linux_Biolinum_O_Bold,
>>  Linux_Libertine_O,
>>  Linux_Libertine_O_Bold,
>>  Linux_Libertine_O_Bold_Italic,
>>  Linux_Libertine_O_C,
>>  Linux_Libertine_O_Italic,
>>  Lohit_Assamese,
>>  Lohit_Bengali,
>>  Lohit_Gujarati,
>>  Lohit_Hindi,
>>  Lohit_Malayalam,
>>  Lohit_Oriya,
>>  Lohit_Punjabi,
>>  Lohit_Tamil,
>>  Lohit_Telugu,
>>  Loma,
>>  Loma_Bold,
>>  Loma_Bold_Oblique,
>>  Loma_Oblique,
>>  Lucida_Bright,
>>  Lucida_Bright_Italic,
>>  Lucida_Bright_Semi-Bold,
>>  Lucida_Bright_Semi-Bold_Italic,
>>  Lucida_Sans,
>>  Lucida_Sans_Oblique,
>>  Lucida_Sans_Semi-Bold,
>>  Lucida_Sans_Semi-Bold_Oblique,
>>  Lucida_Sans_Typewriter,
>>  Lucida_Sans_Typewriter_Bold,
>>  Lucida_Sans_Typewriter_Bold_Oblique,
>>  Mallige,
>>  Mallige_Bold,
>>  Mallige_Bold_Italic,
>>  Mallige_Italic,
>>  Mashq,
>>  Meera,
>>  Metal,
>>  Mitra_Mono,
>>  Monapo,
>>  Mukti_Narrow,
>>  Mukti_Narrow_Bold,
>>  Nada,
>>  Nagham,
>>  Nice,
>>  Norasi,
>>  Norasi_Bold,
>>  Norasi_Bold_Italic,
>>  Norasi_Bold_Oblique,
>>  Norasi_Italic,
>>  Norasi_Oblique,
>>  OpenSymbol,
>>  Ostorah,
>>  Padauk,
>>  Padauk_Bold,
>>  Petra,
>>  Phetsarath_OT,
>>  Pothana2000,
>>  Proclamate_Light,
>>  Purisa_Light,
>>  Rachana,
>>  Rachana_w01,
>>  RaghuMalayalam,
>>  Rehan,
>>  Rekha,
>>  Saab,
>>  Salem,
>>  Samanata,
>>  Samyak_Gujarati,
>>  Samyak_Oriya,
>>  Sazanami_Gothic,
>>  Sazanami_Mincho,
>>  Scheherazade,
>>  Serto_Batnan,
>>  Serto_Batnan_Bold,
>>  Serto_Jerusalem,
>>  Serto_Jerusalem_Bold,
>>  Serto_Jerusalem_Italic,
>>  Serto_Kharput,
>>  Serto_Malankara,
>>  Serto_Mardin,
>>  Serto_Mardin_Bold,
>>  Serto_Urhoy,
>>  Serto_Urhoy_Bold,
>>  Shado,
>>  Sharjah,
>>  TAMu_Kadambri,
>>  TAMu_Kalyani,
>>  TAMu_Maduram,
>>  TSCu_Comic,
>>  TSCu_Paranar,
>>  TSCu_Paranar_Bold,
>>  TSCu_Paranar_Italic,
>>  TSCu_Times,
>>  TakaoExGothic,
>>  TakaoExMincho,
>>  TakaoGothic,
>>  TakaoMincho,
>>  TakaoPGothic,
>>  TakaoPMincho,
>>  Tarablus,
>>  Tholoth,
>>  Tibetan_Machine_Uni,
>>  Times_New_Roman,
>>  Times_New_Roman_Bold,
>>  Times_New_Roman_Bold_Italic,
>>  Times_New_Roman_Italic,
>>  TlwgMono,
>>  TlwgMono_Bold,
>>  TlwgMono_Bold_Oblique,
>>  TlwgMono_Oblique,
>>  TlwgTypewriter,
>>  TlwgTypewriter_Bold,
>>  TlwgTypewriter_Bold_Oblique,
>>  TlwgTypewriter_Oblique,
>>  Trebuchet_MS,
>>  Trebuchet_MS_Bold,
>>  Trebuchet_MS_Bold_Italic,
>>  Trebuchet_MS_Italic,
>>  URW_Bookman_L,
>>  URW_Bookman_L_Bold,
>>  URW_Bookman_L_Bold_Italic,
>>  URW_Bookman_L_Italic,
>>  URW_Bookman_L_Light_Italic,
>>  UmePlus_Gothic,
>>  UmePlus_P_Gothic,
>>  UnBatang,
>>  UnBatang_Bold,
>>  UnDotum,
>>  UnDotum_Bold,
>>  UnifrakturMaguntia,
>>  Unikurd_Web,
>>  Uttara,
>>  VL_Gothic,
>>  VL_PGothic,
>>  Vemana2000,
>>  Verdana,
>>  Verdana_Bold,
>>  Verdana_Bold_Italic,
>>  Verdana_Italic,
>>  Walbaum-Fraktur,
>>  Webdings,
>>  WenQuanYi_Zen_Hei,
>>  Wyld,
>>  Wyld_Italic,
>>  aakar,
>>  batang,
>>  chandas1-1,
>>  chandas1-2,
>>  cheluvi,
>>  dotum,
>>  gargi,
>>  gulim,
>>  hline,
>>  ipag,
>>  ipagp,
>>  ipagui,
>>  ipam,
>>  ipamp,
>>  kalimati,
>>  kochi-gothic,
>>  kochi-gothic-subst,
>>  kochi-mincho,
>>  kochi-mincho-subst,
>>  lklug,
>>  lohit_bn,
>>  lohit_gu,
>>  lohit_hi,
>>  lohit_ml,
>>  lohit_or,
>>  lohit_pa,
>>  lohit_ta,
>>  lohit_te,
>>  monapo,
>>  ori1Uni,
>>  padmaa,
>>  padmaa_Bold,
>>  suruma
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> To post to this group, send email to
> Visit this group at
> To view this discussion on the web visit
> <>
> .
> For more options, visit

You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To post to this group, send email to
Visit this group at
To view this discussion on the web visit
For more options, visit

Reply via email to