[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2017-01-05 Thread Helmut Wollmersdorfer
In the case of computer generated characters of fixed width and original resolution an exact comparison may also work. I did this 10 years ago for automatically testing the Debian installer in a virtual machine: - run the VM in X11-window - take a screenshot of the window - cut out the characte

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2017-01-02 Thread Tom Morris
p.s. If you post some example images, I'm happy to knock together a quick example for you. It looks like the native file format is AVI and AVI files have the ability to incorporate streams of not only video and audio, but also closed captioning info and other metadata. Is it safe to assume that

Re: [tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Allistair
Have you tried Google Cloud Vision at all - its OCR seems superior to Tesseract from tests I have done to date. I just span up a project for you and it did pretty well (error on single digit zero which matched \n2 instead of 0) but maybe with some of the preprocessing you are doing it will work be

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Tom Morris
Tensorflow? No, no, no. You've switched from a chainsaw to a sledgehammer, your flowers are going to be exceedingly unhappy! I just had a Googler present Tensorflow to the GDG I run a couple of weeks ago and if we were working with handwritten digits, we could get you up to 99+% accuracy in a m

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Ben Weinstein
Hey Tom, These are video screenshots, so no luck on the EXIF info, that was the guiding idea. Your feedback was exactly what I was looking for. Sounds like tesseract isn't the right tool, on to tensorflow and building my own deep learning structure. Thanks! Ben On Saturday, December 31, 201

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Tom Morris
p.s. Taking an even bigger step back - are you sure there isn't digital metadata available either in a separate file or embedded in the EXIF data for the image? The whole image processing schtick, while fun, may be unnecessary. Tom On Saturday, December 31, 2016 at 12:29:02 PM UTC-5, Tom Morri

[tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Tom Morris
I'm not sure Tesseract is the tool for the job here. It strikes me as the CV equivalent of taking a chainsaw to prune your flower garden. Are the captures all from the same model camera? A few different models? The one example image that you showed uses a low resolution, probably fixed pitch, f