Do you have higher resolution images to work with - that's one issue going 
on here as the edges of your text are very fuzzy and at that resolution 
it's pretty hard for Tesseract. You can also play with Thresholding and 
Opening (Erosion/Dilation) to thicken some of your lines up (using e.g. 
ImageMagick or OpenCV) prior to Tesseract.

On Wednesday, 12 November 2014 22:00:03 UTC, Bill Garrison wrote:
>
> So if someone sends in labels like the attached ones, I need to grab the 
> model number. So far results from straight tesseract usage are dismal. I 
> used an ImageMagick library to clean up the image a bit and send it in and 
> if its rotated at ALL the results are still dismal. Overall, I am just 
> looking to increase accuracy. 
>
> Steps I have taken:
>
> 1) Using pre-processing library to clean up image
> 2) Added a new config that turns off dictionary and calls in a words file 
> that has all the different samsung model numbers in it
> 3) tried to take my most promising pre-processed image and create a box 
> file and then used "tesseract <image_name> <box_file_name> nobatch 
> box.train" to train tesseract to not miss the two characters it missed 
> ....this caused a segmentation fault. 
>
> Any hints or advice about how I can use tesseract to grab this information 
> with at least 50% accuracy would be GREATLY appreciated. 
>
> Thanks!!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7ee51e00-3133-4e8a-b9e2-ff1c78f8bb76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to