Actually, it doesn't serve regexp - the while list as an enumeration is supported only.
On Friday, July 1, 2011 at 9:59:26 PM UTC+2, 8flm6 wrote: > > Yes a filtering by regular expressions sounds good, though I had > hoped tesseract could do this on its own. I might try a set of > trainedata > limited to numbers as well, in addition to white and black lists. > Maybe that works, I will post my results when finished. > > thanks for your reply! > > 8flm6 > > On 1 Jul., 15:15, patrickq <patrick.questemb...@gmail.com> wrote: > > Yes, Tesseract black lists and whitelists are useful almost > > exclusively in situations where you really don't have the blacklisted > > characters anywhere in the image (otherwise Tesseract will return the > > next best guess, no matter how poor) or vice-versa where you have only > > the whitelisted characters in image. > > > > The solution for achieving what you want is to set a variable telling > > Tesseract to ignore any match it finds below a specified confidence > > level. I wouldn't be surprised if there is such a variable but I have > > no idea what it is. > > > > We take a different approach to detecting numbers with tolerance for > > errors: we define in our regular expressions a long list of letters we > > accept as digits them convert - but we do that only when it helps us > > complete a pattern. For example: > > - in (88B)1G2-2345 we accept and map to (888)162-2345 > > - but in "BB (123)456-7861" we leave the B's alone > > > > Patrick > > > > On Jul 1, 8:35 am, 8flm6 <8f...@gmx.de> wrote: > > > > > > > > > > > > > > > > > Hello, > > > I'm trying to apply White- and Blacklists to my OCR-result. If I call: > > > SetVariable("tessedit_char_whitelist", "0123456789") > > > > > Then all characters in the result are converted to numbers between 0 > > > and 9. Is that the correct behaviour > > > of this option? After my understanding of a whitelist, only those > > > characters should returned which are > > > defined in the list, all others should be blocked. > > > The same with the blacklist. I call: > > > SetVariable("tessedit_char_blacklist", "0123456789") > > > > > This option converts all occurences of numbers to random characters. > > > > > This is the image I used: > https://docs.google.com/leaf?id=0B2ifXewLRYsdMzY3MzIwMTUtZTkxNS00ZDM1... > > > > > Example results: > > > normal output: > > > Tesseract 3.00 > > > 123456789 > > > > > whitelist output: > > > 1185587301 3100 > > > 123456789 > > > > > blacklist output: > > > Tesseract B.OO > > > QBASGYSQ > > > > > Any help would be appreciated! > > > > > thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a8055ff6-62a7-47db-9a4e-e66c88275d19%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.