Hi I think the result is perfectly correct. To get just the numbers, surely you must use whitelist instead of the blacklist, and then go through your output and replace all non-numerics with a space! I expect you will need some punctuation too +- ,.: etc If these occur in the text part then they need to be thrown away too eg if a punctuation is followed by a numeric it`s OK ?
Good Luck John On Fri, Jul 1, 2011 at 8:35 AM, 8flm6 <8f...@gmx.de> wrote: > Hello, > I'm trying to apply White- and Blacklists to my OCR-result. If I call: > SetVariable("tessedit_char_whitelist", "0123456789") > > Then all characters in the result are converted to numbers between 0 > and 9. Is that the correct behaviour > of this option? After my understanding of a whitelist, only those > characters should returned which are > defined in the list, all others should be blocked. > The same with the blacklist. I call: > SetVariable("tessedit_char_blacklist", "0123456789") > > This option converts all occurences of numbers to random characters. > > This is the image I used: > > https://docs.google.com/leaf?id=0B2ifXewLRYsdMzY3MzIwMTUtZTkxNS00ZDM1LTllYjgtN2NhMjU0MzRkNWQ4&hl=de > > Example results: > normal output: > Tesseract 3.00 > 123456789 > > whitelist output: > 1185587301 3100 > 123456789 > > blacklist output: > Tesseract B.OO > QBASGYSQ > > Any help would be appreciated! > > thanks > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- John Brohan http://www.woundfollowup.com tel 514 995 3749. 5 minute movie http://tinyurl.com/22kfdv8 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en