devTess be careful with coffee, don't overdose ))

> Q1
> Init(datapath, language, OcrEngineMode);
> What is the normal setting of OcrEngineMode?
Currently OEM_OcrEngineMode = TESSERACT_ONLY would be sufficient for all cases.

> Q2: which of the following is USED In normal running mode of
> tessearct.exe to recognize text
The values of the variables you can see within the code of Recognize()
(e.g. tesseract_->tessedit_resegment_from_boxes) are often loaded from
config files. Usually recognition runs with no config files at all, so
you can assume all these variables to be "false". In that way you can
examine the control paths and figure out what procedures get called at
the recognition stage.

> Q3: which of the following is USED In normal running mode of
> tessearct.exe to recognize text
You meant "to train" - copy-paste. Training is a 2-stage process:

  1) Making box files. Requires two config files: "batch.nochop" and "makebox"

  2) Generation of .tr files. Needs "nobatch" and "box.train"

You can find the above configs inside the tessdata/configs and
tessdata/tessconfigs directories in Tess's distribution. Check these
files and you'll understand what usually happens while training. Plain
old step-by-step debugging is also of use ))

Warm regards,
Dmitry Silaev




On Tue, Feb 8, 2011 at 6:44 PM, devTess <jim...@googlemail.com> wrote:
>
> Hi Dimitry, with the guidelines provided from you, I prepared a strong
> cup of coffee and start reading the top part of baseapi.h
>
> Q1
> Init(datapath, language, OcrEngineMode);
> What is the normal setting of OcrEngineMode?
>
> I try to use the :Recognize(ETEXT_DESC* monitor) method.
> >>> There are two PARTS to the Recognize method
>
> Part ONE:
> Q2: which of the following is USED In normal running mode of
> tessearct.exe to recognize text
>
>  if (tesseract_->tessedit_resegment_from_line_boxes)
>    page_res_ = tesseract_->ApplyBoxes(*input_file_, true,
> block_list_);
>  else if (tesseract_->tessedit_resegment_from_boxes)
>    page_res_ = tesseract_->ApplyBoxes(*input_file_, false,
> block_list_);
>  else
>    page_res_ = new PAGE_RES(block_list_, &tesseract_-
> >prev_word_best_choice_);  <<My guess>
>  if (tesseract_->tessedit_make_boxes_from_boxes) {
>    tesseract_->CorrectClassifyWords(page_res_);
>    return 0;
>  }
>
> Part TWO:
> Q3: which of the following is USED In normal running mode of
> tessearct.exe to recognize text
> if (tesseract_->interactive_mode) {
>    tesseract_->pgeditor_main(rect_width_, rect_height_, page_res_);
>    // The page_res is invalid after an interactive session, so
> cleanup
>    // in a way that lets us continue to the next page without
> crashing.
>    delete page_res_;
>    page_res_ = NULL;
>    return -1;
>  } else if (tesseract_->tessedit_train_from_boxes) {
>    tesseract_->ApplyBoxTraining(*output_file_, page_res_);
>  } else if (tesseract_->tessedit_ambigs_training) {
>    FILE *training_output_file = tesseract_-
> >init_recog_training(*input_file_);
>    // OCR the page segmented into words by tesseract.
>    tesseract_->recog_training_segmented(
>        *input_file_, page_res_, monitor, training_output_file);
>    fclose(training_output_file);
>  } else {
>    // Now run the main recognition.
>    tesseract_->recog_all_words(page_res_, monitor, NULL, NULL, 0);
> <<My guess>
>  }
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to 
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to