[tesseract-ocr] Any suggestions on pre-processing to improve accuracy?

I'm wondering how I can get better results with Tesseract.  

Here are a few images I've been testing with + results:



*Image <http://tleyden-misc.s3.amazonaws.com/ocr-test-data/10.jpg>*:  
<https://lh5.googleusercontent.com/-MwWj-AZZOEo/U6RX1yWJZAI/AAAAAAAAAaU/8N-lWBwBvCc/s1600/10.jpg>
*Actual OCR text*: VCZZSWE
*Expected OCR text*: VC22500E


*Image <http://tleyden-misc.s3.amazonaws.com/ocr-test-data/9.jpg>*:  
<https://lh4.googleusercontent.com/-GdcSO9f7gZI/U6RXzF77PuI/AAAAAAAAAaM/TnKetG_P9ps/s1600/9.jpg>
 
*Actual OCR text*: ViZZSWE DRIVEWAY
*Expected OCR text*: VC22500E DRIVEWAY


Any tips on doing pre-processing on the images to improve the recognition?

The code I'm using to call tesseract (via go-tesseract) is here: 

  
https://github.com/tleyden/open-ocr/blob/master/tesseract_engine.go#L49-L53

Version: I'm using the tesseract-ocr-eng package from Debian Jessie, which 
looks to be version: 3.02-2 
<http://ftp.de.debian.org/debian/pool/main/t/tesseract-eng/tesseract-eng_3.02-2.dsc>
  
(the full build script is available in this Dockerfile 
<https://registry.hub.docker.com/u/tleyden5iwx/go-tesseract-trusted/dockerfile>
)




   

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f38687f5-84cc-4c4b-a7c9-e5e6db6f327b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Any suggestions on pre-processing to improve accuracy?

Reply via email to