As far as I understand, the font limitation applies up to tesseract 3.02.

Major changes to training are currently in the works in SVN for 3.03 (not
fully released yet - hence you see large number of fonts for english
traineddata but not for others). The other languages traineddata maybe
forthcoming in future.

Ray/Zdenko/Nick may be able to give an idea of expected timeline for
release.

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Tue, Jul 8, 2014 at 5:04 PM, Paul <p...@vorb.de> wrote:

> If you have a look at intproto.h, you'll see there is a similar
> limitation, bit it's much more complicated. Unfortunately I don't have an
> overview of what is possible yet, but I'm working on it. :) Just use
> normproto.h as a reference.
>
> Am Dienstag, 8. Juli 2014 02:55:37 UTC+2 schrieb Albrecht Hilker:
>
>> The manual "Training Tesseract 3" says:
>>
>> > Tesseract needs to know about different shapes of the same character by
>> having different fonts separated explicitly.
>> > This used to be limited to 32 fonts, but the limit has been raised to
>> 64.
>> > It is set by the constant MAX_NUM_CONFIGS defined in intproto.h.
>> > Note that runtime is heavily dependent on the number of fonts provided,
>> and training more than 32 will result in a significant slow-down.
>>
>>
>>
>> I analyzed the number of fonts in eng.traineddata and I was very
>> surprised that there have been 358 fonts trained !
>> get_fontinfo_table().size() returns 358 !
>>
>>
>> Can anybody explain me this contradiction ?
>>
>>
>>
>>
>> Fonts in eng.traineddata:
>>
>>  AR_PL_UKai_CN,
>>  AR_PL_UKai_Patched,
>>  AR_PL_UKai_TW,
>>  AR_PL_UMing_CN_Light,
>>  AR_PL_UMing_Patched_Light,
>>  AR_PL_UMing_TW_MBE_Light,
>>  Aboriginal_Sans,
>>  Aboriginal_Sans_Bold_Italic,
>>  Aboriginal_Sans_Italic,
>>  Aboriginal_Serif,
>>  Aboriginal_Serif_Bold,
>>  Aboriginal_Serif_Bold_Italic,
>>  Aboriginal_Serif_Italic,
>>  Abyssinica_SIL,
>>  AlArabiya,
>>  AlBattar,
>>  AlHor,
>>  AlManzomah,
>>  AlMohanad,
>>  Andale_Mono,
>>  Ani,
>>  AnjaliOldLipi,
>>  Arab,
>>  Arial,
>>  Arial_Black,
>>  Arial_Bold,
>>  Arial_Bold_Italic,
>>  Arial_Italic,
>>  BPG_Chveulebrivi,
>>  BPG_Chveulebrivi_Bold,
>>  BPG_Courier,
>>  BPG_Courier_Bold,
>>  BPG_Elite,
>>  BPG_Elite_Bold,
>>  BPG_Glaho,
>>  BPG_Glaho_Bold,
>>  BPG_Rioni,
>>  BPG_Rioni_Bold,
>>  BPG_Unicode_Standard,
>>  Baekmuk_Batang,
>>  Baekmuk_Batang_Patched,
>>  Baekmuk_Dotum,
>>  Baekmuk_Gulim,
>>  Baekmuk_Headline,
>>  Bangla,
>>  Bitstream_Vera_Sans,
>>  Bitstream_Vera_Sans_Bold,
>>  Bitstream_Vera_Sans_Bold_Oblique,
>>  Bitstream_Vera_Sans_Mono,
>>  Bitstream_Vera_Sans_Mono_Bold,
>>  Bitstream_Vera_Sans_Mono_Bold_Oblique,
>>  Bitstream_Vera_Sans_Mono_Oblique,
>>  Bitstream_Vera_Sans_Mono_Roman,
>>  Bitstream_Vera_Sans_Oblique,
>>  Bitstream_Vera_Sans_Roman,
>>  Bitstream_Vera_Serif,
>>  Bitstream_Vera_Serif_Bold,
>>  Bitstream_Vera_Serif_Roman,
>>  CaslonishFraxx,
>>  Century_Schoolbook_L,
>>  Century_Schoolbook_L_Bold,
>>  Century_Schoolbook_L_Bold_Italic,
>>  Century_Schoolbook_L_Italic,
>>  Century_Schoolbook_L_Roman,
>>  Chandas,
>>  Cloister_Black_Light,
>>  Comic_Sans_MS,
>>  Comic_Sans_MS_Bold,
>>  Cortoba,
>>  Courier_New,
>>  Courier_New_Bold,
>>  Courier_New_Bold_Italic,
>>  Courier_New_Italic,
>>  DejaVu_Sans,
>>  DejaVu_Sans_Bold,
>>  DejaVu_Sans_Bold_Oblique,
>>  DejaVu_Sans_Condensed,
>>  DejaVu_Sans_Condensed_Bold,
>>  DejaVu_Sans_Condensed_Bold_Oblique,
>>  DejaVu_Sans_Condensed_Oblique,
>>  DejaVu_Sans_Mono,
>>  DejaVu_Sans_Mono_Bold,
>>  DejaVu_Sans_Mono_Bold_Oblique,
>>  DejaVu_Sans_Mono_Oblique,
>>  DejaVu_Sans_Oblique,
>>  DejaVu_Sans_Ultra-Light,
>>  DejaVu_Serif,
>>  DejaVu_Serif_Bold,
>>  DejaVu_Serif_Bold_Italic,
>>  DejaVu_Serif_Bold_Oblique,
>>  DejaVu_Serif_Bold_Semi-Condensed,
>>  DejaVu_Serif_Condensed_Bold,
>>  DejaVu_Serif_Condensed_Bold_Italic,
>>  DejaVu_Serif_Condensed_Italic,
>>  DejaVu_Serif_Italic,
>>  DejaVu_Serif_Oblique,
>>  DejaVu_Serif_Semi-Condensed,
>>  Dimnah,
>>  Dustismo,
>>  Dustismo_Roman,
>>  Dustismo_Roman_Bold,
>>  Dustismo_Roman_Italic,
>>  Dustismo_Roman_Italic_Bold,
>>  Dyuthi,
>>  East_Syriac_Adiabene,
>>  East_Syriac_Ctesiphon,
>>  Electron,
>>  Estrangelo_Antioch,
>>  Estrangelo_Edessa,
>>  Estrangelo_Midyat,
>>  Estrangelo_Nisibin,
>>  Estrangelo_Quenneshrin,
>>  Estrangelo_Talada,
>>  Estrangelo_TurAbdin,
>>  FreeMono,
>>  FreeMono_Bold,
>>  FreeMono_Bold_Italic,
>>  FreeMono_Bold_Oblique,
>>  FreeMono_Italic,
>>  FreeMono_Oblique,
>>  FreeSans,
>>  FreeSans_Bold,
>>  FreeSans_Bold_Oblique,
>>  FreeSans_Oblique,
>>  FreeSerif,
>>  FreeSerif_Bold,
>>  FreeSerif_Bold_Italic,
>>  FreeSerif_Italic,
>>  Furat,
>>  Garuda,
>>  Garuda_Bold,
>>  Garuda_Bold_Oblique,
>>  Garuda_Oblique,
>>  GentiumAlt,
>>  GentiumAlt_Italic,
>>  Georgia,
>>  Georgia_Bold,
>>  Georgia_Bold_Italic,
>>  Georgia_Italic,
>>  Granada,
>>  Graph,
>>  Hani,
>>  Haramain,
>>  Hor,
>>  IPAGothic,
>>  IPAMincho,
>>  IPAPGothic,
>>  IPAPMincho,
>>  IPAUIGothic,
>>  Impact,
>>  Impact_Condensed,
>>  Jamrul,
>>  Jamrul_Semi-Expanded,
>>  Japan,
>>  Jet,
>>  Kalimati,
>>  Kalyani,
>>  Kayrawan,
>>  Kedage,
>>  Kedage_Bold,
>>  Kedage_Bold_Italic,
>>  Kedage_Italic,
>>  Khalid,
>>  Khmer_OS,
>>  Khmer_OS_Battambang,
>>  Khmer_OS_Bokor,
>>  Khmer_OS_Content,
>>  Khmer_OS_Fasthand,
>>  Khmer_OS_Freehand,
>>  Khmer_OS_Metal_Chrieng,
>>  Khmer_OS_Muol,
>>  Khmer_OS_Muol_Light,
>>  Khmer_OS_Muol_Pali,
>>  Khmer_OS_Siemreap,
>>  Khmer_OS_System,
>>  Kochi_Gothic,
>>  Kochi_Mincho,
>>  LKLUG,
>>  Lateef,
>>  Likhan,
>>  Linux_Biolinum_O,
>>  Linux_Biolinum_O_Bold,
>>  Linux_Libertine_O,
>>  Linux_Libertine_O_Bold,
>>  Linux_Libertine_O_Bold_Italic,
>>  Linux_Libertine_O_C,
>>  Linux_Libertine_O_Italic,
>>  Lohit_Assamese,
>>  Lohit_Bengali,
>>  Lohit_Gujarati,
>>  Lohit_Hindi,
>>  Lohit_Malayalam,
>>  Lohit_Oriya,
>>  Lohit_Punjabi,
>>  Lohit_Tamil,
>>  Lohit_Telugu,
>>  Loma,
>>  Loma_Bold,
>>  Loma_Bold_Oblique,
>>  Loma_Oblique,
>>  Lucida_Bright,
>>  Lucida_Bright_Italic,
>>  Lucida_Bright_Semi-Bold,
>>  Lucida_Bright_Semi-Bold_Italic,
>>  Lucida_Sans,
>>  Lucida_Sans_Oblique,
>>  Lucida_Sans_Semi-Bold,
>>  Lucida_Sans_Semi-Bold_Oblique,
>>  Lucida_Sans_Typewriter,
>>  Lucida_Sans_Typewriter_Bold,
>>  Lucida_Sans_Typewriter_Bold_Oblique,
>>  Mallige,
>>  Mallige_Bold,
>>  Mallige_Bold_Italic,
>>  Mallige_Italic,
>>  Mashq,
>>  Meera,
>>  Metal,
>>  Mitra_Mono,
>>  Monapo,
>>  Mukti_Narrow,
>>  Mukti_Narrow_Bold,
>>  Nada,
>>  Nagham,
>>  Nice,
>>  Norasi,
>>  Norasi_Bold,
>>  Norasi_Bold_Italic,
>>  Norasi_Bold_Oblique,
>>  Norasi_Italic,
>>  Norasi_Oblique,
>>  OpenSymbol,
>>  Ostorah,
>>  Padauk,
>>  Padauk_Bold,
>>  Petra,
>>  Phetsarath_OT,
>>  Pothana2000,
>>  Proclamate_Light,
>>  Purisa_Light,
>>  Rachana,
>>  Rachana_w01,
>>  RaghuMalayalam,
>>  Rehan,
>>  Rekha,
>>  Saab,
>>  Salem,
>>  Samanata,
>>  Samyak_Gujarati,
>>  Samyak_Oriya,
>>  Sazanami_Gothic,
>>  Sazanami_Mincho,
>>  Scheherazade,
>>  Serto_Batnan,
>>  Serto_Batnan_Bold,
>>  Serto_Jerusalem,
>>  Serto_Jerusalem_Bold,
>>  Serto_Jerusalem_Italic,
>>  Serto_Kharput,
>>  Serto_Malankara,
>>  Serto_Mardin,
>>  Serto_Mardin_Bold,
>>  Serto_Urhoy,
>>  Serto_Urhoy_Bold,
>>  Shado,
>>  Sharjah,
>>  TAMu_Kadambri,
>>  TAMu_Kalyani,
>>  TAMu_Maduram,
>>  TSCu_Comic,
>>  TSCu_Paranar,
>>  TSCu_Paranar_Bold,
>>  TSCu_Paranar_Italic,
>>  TSCu_Times,
>>  TakaoExGothic,
>>  TakaoExMincho,
>>  TakaoGothic,
>>  TakaoMincho,
>>  TakaoPGothic,
>>  TakaoPMincho,
>>  Tarablus,
>>  Tholoth,
>>  Tibetan_Machine_Uni,
>>  Times_New_Roman,
>>  Times_New_Roman_Bold,
>>  Times_New_Roman_Bold_Italic,
>>  Times_New_Roman_Italic,
>>  TlwgMono,
>>  TlwgMono_Bold,
>>  TlwgMono_Bold_Oblique,
>>  TlwgMono_Oblique,
>>  TlwgTypewriter,
>>  TlwgTypewriter_Bold,
>>  TlwgTypewriter_Bold_Oblique,
>>  TlwgTypewriter_Oblique,
>>  Trebuchet_MS,
>>  Trebuchet_MS_Bold,
>>  Trebuchet_MS_Bold_Italic,
>>  Trebuchet_MS_Italic,
>>  URW_Bookman_L,
>>  URW_Bookman_L_Bold,
>>  URW_Bookman_L_Bold_Italic,
>>  URW_Bookman_L_Italic,
>>  URW_Bookman_L_Light_Italic,
>>  UmePlus_Gothic,
>>  UmePlus_P_Gothic,
>>  UnBatang,
>>  UnBatang_Bold,
>>  UnDotum,
>>  UnDotum_Bold,
>>  UnifrakturMaguntia,
>>  Unikurd_Web,
>>  Uttara,
>>  VL_Gothic,
>>  VL_PGothic,
>>  Vemana2000,
>>  Verdana,
>>  Verdana_Bold,
>>  Verdana_Bold_Italic,
>>  Verdana_Italic,
>>  Walbaum-Fraktur,
>>  Webdings,
>>  WenQuanYi_Zen_Hei,
>>  Wyld,
>>  Wyld_Italic,
>>  aakar,
>>  batang,
>>  chandas1-1,
>>  chandas1-2,
>>  cheluvi,
>>  dotum,
>>  gargi,
>>  gulim,
>>  hline,
>>  ipag,
>>  ipagp,
>>  ipagui,
>>  ipam,
>>  ipamp,
>>  kalimati,
>>  kochi-gothic,
>>  kochi-gothic-subst,
>>  kochi-mincho,
>>  kochi-mincho-subst,
>>  lklug,
>>  lohit_bn,
>>  lohit_gu,
>>  lohit_hi,
>>  lohit_ml,
>>  lohit_or,
>>  lohit_pa,
>>  lohit_ta,
>>  lohit_te,
>>  monapo,
>>  ori1Uni,
>>  padmaa,
>>  padmaa_Bold,
>>  suruma
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bee86d37-9e63-4d76-be78-345b8ed7f931%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/bee86d37-9e63-4d76-be78-345b8ed7f931%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVU_uo%3DJq3OMcmW13ewOYc7AtfT6jMtBV7EKhn299wAwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to