Hello,

we are trying to recognize sequences of letters and digits with only a
weak syntax. Well, we do know that the sequences start with certain
typical letter pairs but after that they can come in basically any
order.

Here are our questions:

1. What does tesseract do when there is no dictionary at all and the
font contains only letters and digits? Does that differ at all from a
dictionary that contains the letters and digits each on its own line?
2. Would it help to have a dictionary with the letters and the digits
plus the letter pairs that we know can appear at the start?
3. If we know that the digit sequences can never start with a 0, would
it help to include all digit sequences (up to five digits) in the
dictionary (and remove the single 0)? Or does this basically not
provide any real information? Does the size of the dictionary have
much influence on the execution time?

Thanks in advance for any help you can provide.

Best regards,
Marcus

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to