I just do some preprocessing and auto rotate/clean up images before passing it on the Tesseract.
On Friday, September 22, 2017 at 5:11:39 AM UTC-4, Breno Faria wrote: > > > Hi everyone, > > let me begin with a big fat compliment on the Tesseract project on the > quality of the OCR since the new LSTM models have been adopted! You are > consistently better than ABBYY now. > > I have just made a few experiments with slightly rotated documents. There > ABBYY is still better than Tesseract 4. > > I was wondering if it wouldn't be very easy to just generate rotated > training data by just duplicating and rotating the existing training > documents. Wouldn't the model than automatically learn to handle this? > > Has anyone tried this yet? > > Cheers! > > Breno > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ad6c5ce2-fc77-4a43-8519-15bb54568764%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

