Hello, we are trying to recognize sequences of letters and digits with only a weak syntax. Well, we do know that the sequences start with certain typical letter pairs but after that they can come in basically any order.
Here are our questions: 1. What does tesseract do when there is no dictionary at all and the font contains only letters and digits? Does that differ at all from a dictionary that contains the letters and digits each on its own line? 2. Would it help to have a dictionary with the letters and the digits plus the letter pairs that we know can appear at the start? 3. If we know that the digit sequences can never start with a 0, would it help to include all digit sequences (up to five digits) in the dictionary (and remove the single 0)? Or does this basically not provide any real information? Does the size of the dictionary have much influence on the execution time? Thanks in advance for any help you can provide. Best regards, Marcus -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en