Actually, it doesn't serve regexp - the while list as an enumeration is 
supported only.

On Friday, July 1, 2011 at 9:59:26 PM UTC+2, 8flm6 wrote:
>
> Yes a filtering by regular expressions sounds good, though I had 
> hoped tesseract could do this on its own. I might try a set of 
> trainedata 
> limited to numbers as well, in addition to white and black lists. 
> Maybe that works, I will post my results when finished. 
>
> thanks for your reply! 
>
> 8flm6 
>
> On 1 Jul., 15:15, patrickq <patrick.questemb...@gmail.com> wrote: 
> > Yes, Tesseract black lists and whitelists are useful almost 
> > exclusively in situations where you really don't have the blacklisted 
> > characters anywhere in the image (otherwise Tesseract will return the 
> > next best guess, no matter how poor) or vice-versa where you have only 
> > the whitelisted characters in image. 
> > 
> > The solution for achieving what you want is to set a variable telling 
> > Tesseract to ignore any match it finds below a specified confidence 
> > level. I wouldn't be surprised if there is such a variable but I have 
> > no idea what it is. 
> > 
> > We take a different approach to detecting numbers with tolerance for 
> > errors: we define in our regular expressions a long list of letters we 
> > accept as digits them convert - but we do that only when it helps us 
> > complete a pattern. For example: 
> > - in (88B)1G2-2345 we accept and map to (888)162-2345 
> > - but in "BB (123)456-7861" we leave the B's alone 
> > 
> > Patrick 
> > 
> > On Jul 1, 8:35 am, 8flm6 <8f...@gmx.de> wrote: 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > > Hello, 
> > > I'm trying to apply White- and Blacklists to my OCR-result. If I call: 
> > > SetVariable("tessedit_char_whitelist", "0123456789") 
> > 
> > > Then all characters in the result are converted to numbers between 0 
> > > and 9. Is that the correct behaviour 
> > > of this option? After my understanding of a whitelist, only those 
> > > characters should returned which are 
> > > defined in the list, all others should be blocked. 
> > > The same with the blacklist. I call: 
> > > SetVariable("tessedit_char_blacklist", "0123456789") 
> > 
> > > This option converts all occurences of numbers to random characters. 
> > 
> > > This is the image I used:
> https://docs.google.com/leaf?id=0B2ifXewLRYsdMzY3MzIwMTUtZTkxNS00ZDM1... 
> > 
> > > Example results: 
> > > normal output: 
> > > Tesseract 3.00 
> > > 123456789 
> > 
> > > whitelist output: 
> > > 1185587301 3100 
> > > 123456789 
> > 
> > > blacklist output: 
> > > Tesseract B.OO 
> > > QBASGYSQ 
> > 
> > > Any help would be appreciated! 
> > 
> > > thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a8055ff6-62a7-47db-9a4e-e66c88275d19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to