On 06-Feb-16 16:05, Paul Koning wrote: >> On Feb 6, 2016, at 2:28 PM, Tom Morris <tfmor...@gmail.com> wrote: >> >> ... >> I think Tesseract is pretty close to the quality of ABBYY. Google has >> trained it on a very large corpus and it's used for Google Books, Google >> Drive OCR, etc, so it gets a fair amount of attention. Of course, a lot of >> the training effort has gone into training it for over 100 languages, which >> isn't really relevant to old computer documentation, but even for plain >> English, it's received lots of training attention. > Is Tesseract open source? Yes, it's open sourced. https://github.com/tesseract-ocr
> It sounds vaguely like the one I tried, but I'm not sure; I remember > something that felt more like a toolkit than like an application. Yes, it's the engine. There are various wrappers that provide more polished interfaces. > Google's OCR is pretty lousy in many cases. Perhaps that's because they just > feed it stuff without ever looking at the result. There are plenty of Google > books that have errors in the majority of the words. The amazing thing about a talking dog is not how well it talks, but that it talks at all. For the volume of stuff they've scanned, it's pretty impressive. If a book is that bad, no one looked at it & retrained. What Tom sent around earlier is fairly typical (in my limited experience). It would take someone a good hour or two to clean it up. > paul > >
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Simh mailing list Simh@trailing-edge.com http://mailman.trailing-edge.com/mailman/listinfo/simh