[tesseract-ocr] What does --noextract_font_properties do?

Timothy Snyder Wed, 15 May 2019 07:11:23 -0700

Hey all, quick question:

What does --noextract_font_properties do when using tesstrain.sh?


I've been using the flag for training since it's used in the training guide 
on GitHub. However, there I can't seem to find any usage information.

tesstrain.sh doesn't seem to include it in its usage info: 

echo -e "USAGE: tesstrain.sh
>      --exposures EXPOSURES      # A list of exposure levels to use (e.g. 
> "-1 0 1").
>      --fontlist FONTS           # A list of fontnames to train on.
>      --fonts_dir FONTS_PATH     # Path to font files.
>      --lang LANG_CODE           # ISO 639 code.
>      --langdata_dir DATADIR     # Path to tesseract/training/langdata 
> directory.
>      --linedata_only            # Only generate training data for 
> lstmtraining.
>      --output_dir OUTPUTDIR     # Location of output traineddata file.
>      --overwrite                # Safe to overwrite files in output_dir.
>      --run_shape_clustering     # Run shape clustering (use for Indic 
> langs).
>      --maxpages                 # Specify maximum pages to output 
> (default:0=all)
>      --save_box_tiff            # Save box/tiff pairs along with lstmf 
> files.
>      --xsize                    # Specify width of output image 
> (default:3600)
>
>   OPTIONAL flag for specifying directory with user specified box/tiff 
> pairs.
>   Files should be named similar to 
> ${LANG_CODE}.${fontname}.exp${EXPOSURE}.box/tif
>      --my_boxtiff_dir MY_BOXTIFF_DIR # Location of user specified box/tiff 
> files.
>
>   OPTIONAL flags for input data. If unspecified we will look for them in
>   the langdata_dir directory.
>      --training_text TEXTFILE   # Text to render and use for training.
>      --wordlist WORDFILE        # Word list for the language ordered by
>                                 # decreasing frequency.
>   OPTIONAL flag to specify location of existing traineddata files, required
>   during feature extraction. If unspecified will use TESSDATA_PREFIX 
> defined in
>   the current environment.
>

Thanks! 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9b32c4be-b172-4be7-a338-4a275e9d709b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] What does --noextract_font_properties do?

Reply via email to