> As an added step, you could might consider: rendering to grayscale, > slightly blurring (optional), adding a bit of noise, and then > re-converting to b&w to simulate what physical scanners do? Maybe do > this at 1200dpi and also downsample to 300 dpi.
I wouldn't have thought adding random noise would be helpful; it will just distort the shapes which Tesseract will use to match, and as it will always get different noise to the type I generated, it would only hinder it further. At least that's what I had assumed. Am I wrong about this? Has anybody tested whether adding random noise to an otherwise clean training improves things? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

