Re: [tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Allistair
Have you tried Google Cloud Vision at all - its OCR seems superior to Tesseract from tests I have done to date. I just span up a project for you and it did pretty well (error on single digit zero which matched \n2 instead of 0) but maybe with some of the preprocessing you are doing it will work be

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Tom Morris
Tensorflow? No, no, no. You've switched from a chainsaw to a sledgehammer, your flowers are going to be exceedingly unhappy! I just had a Googler present Tensorflow to the GDG I run a couple of weeks ago and if we were working with handwritten digits, we could get you up to 99+% accuracy in a m

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Ben Weinstein
Hey Tom, These are video screenshots, so no luck on the EXIF info, that was the guiding idea. Your feedback was exactly what I was looking for. Sounds like tesseract isn't the right tool, on to tensorflow and building my own deep learning structure. Thanks! Ben On Saturday, December 31, 201

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Tom Morris
p.s. Taking an even bigger step back - are you sure there isn't digital metadata available either in a separate file or embedded in the EXIF data for the image? The whole image processing schtick, while fun, may be unnecessary. Tom On Saturday, December 31, 2016 at 12:29:02 PM UTC-5, Tom Morri

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Tom Morris
I'm not sure Tesseract is the tool for the job here. It strikes me as the CV equivalent of taking a chainsaw to prune your flower garden. Are the captures all from the same model camera? A few different models? The one example image that you showed uses a low resolution, probably fixed pitch, f