Hi
I think the result is perfectly correct.
To get just the numbers, surely you must use whitelist instead of the
blacklist, and then go through your output and replace all non-numerics with
a space!
I expect you will need some punctuation too +- ,.: etc If these occur in the
text part then they need to be thrown away too eg if a punctuation is
followed by a numeric it`s OK ?

Good Luck
John

On Fri, Jul 1, 2011 at 8:35 AM, 8flm6 <8f...@gmx.de> wrote:

> Hello,
> I'm trying to apply White- and Blacklists to my OCR-result. If I call:
> SetVariable("tessedit_char_whitelist", "0123456789")
>
> Then all characters in the result are converted to numbers between 0
> and 9. Is that the correct behaviour
> of this option? After my understanding of a whitelist, only those
> characters should returned which are
> defined in the list, all others should be blocked.
> The same with the blacklist. I call:
> SetVariable("tessedit_char_blacklist", "0123456789")
>
> This option converts all occurences of numbers to random characters.
>
> This is the image I used:
>
> https://docs.google.com/leaf?id=0B2ifXewLRYsdMzY3MzIwMTUtZTkxNS00ZDM1LTllYjgtN2NhMjU0MzRkNWQ4&hl=de
>
> Example results:
> normal output:
> Tesseract 3.00
> 123456789
>
> whitelist output:
> 1185587301 3100
> 123456789
>
> blacklist output:
> Tesseract B.OO
> QBASGYSQ
>
> Any help would be appreciated!
>
> thanks
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
John Brohan http://www.woundfollowup.com   tel 514 995 3749.
5 minute movie http://tinyurl.com/22kfdv8

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to