Trouble with White- and Blacklists

2011-07-01 Thread 8flm6
Hello, I'm trying to apply White- and Blacklists to my OCR-result. If I call: SetVariable("tessedit_char_whitelist", "0123456789") Then all characters in the result are converted to numbers between 0 and 9. Is that the correct behaviour of this option? After my understanding of a whitelist, only t

Re: Trouble with White- and Blacklists

2011-07-01 Thread patrickq
Yes, Tesseract black lists and whitelists are useful almost exclusively in situations where you really don't have the blacklisted characters anywhere in the image (otherwise Tesseract will return the next best guess, no matter how poor) or vice-versa where you have only the whitelisted characters i

Re: Trouble with White- and Blacklists

2011-07-01 Thread 8flm6
Yes a filtering by regular expressions sounds good, though I had hoped tesseract could do this on its own. I might try a set of trainedata limited to numbers as well, in addition to white and black lists. Maybe that works, I will post my results when finished. thanks for your reply! 8flm6 On 1 J

Re: Trouble with White- and Blacklists

2011-07-02 Thread John Brohan
Hi I think the result is perfectly correct. To get just the numbers, surely you must use whitelist instead of the blacklist, and then go through your output and replace all non-numerics with a space! I expect you will need some punctuation too +- ,.: etc If these occur in the text part then they ne

[tesseract-ocr] Re: Trouble with White- and Blacklists

2017-05-21 Thread Haah H
Actually, it doesn't serve regexp - the while list as an enumeration is supported only. On Friday, July 1, 2011 at 9:59:26 PM UTC+2, 8flm6 wrote: > > Yes a filtering by regular expressions sounds good, though I had > hoped tesseract could do this on its own. I might try a set of > trainedata >