[tesseract-ocr] Re: Advice needed on effective hexadecimal recognition

Tom Morris Sat, 28 Jun 2014 09:42:27 -0700


On Saturday, June 28, 2014 12:39:21 AM UTC-4, scott...@gmail.com wrote:
>
> I have an application (http://hackaday.io/project/1569-NSA-Away) that 
> involves OCR of hexadecimal information from a computer screen using a hand 
> held Android device. I've been able to use the tess-two API wrapper to 
> successfully run Tesseract OCR in an Android emulator and am developing 
> various unit tests to better tune by Tesseract configuration.  The data I 
> am OCR'ing will look something like:
>
> 2C B7 CF 07 1F C6 62 1C 8E 53 10 B1 75 06 06 C9 01 6A 08 DA
> D4 B5 F9 CF 71 0E 7A DB 04 F3 8B 2A 0D 8E EC 41 50 83 CB E4
>
> Where each pair of hex digits represents one byte of information.  I can 
> include error correction if that will be needed.
> ...
>
> I'm also wondering if people have advice about this use case in 
> particular.  Would you recommend upper or lower case hex digits (lower 
> seemed worse in my unit testing), two spaces between words, etc.
>


I'm wondering if you wouldn't be better off with a different symbol set / 
code alphabet.  If it's just 'C' that's giving you fits, replace with 'Y' 
or 'K' or some other letter that Tess can easily distinguish. 
 Alternatively, pick 256 (or more) short dictionary words to represent your 
code points.  Or encode the cipher text in a PNG of a 1-D or 2-D bar code. 
 Or use something like OpenCV where you can more tightly control how the 
symbol recognition is done.

Tom
 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dd0121c2-bf2d-4b12-8e74-942eac64d552%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Advice needed on effective hexadecimal recognition

Reply via email to