I'll write up how to use the tools sometime soon, but in the meantime, here's a bit of information to start you off. Let me know if you need any more:
The tool you need is called 'lazytrain', and is available from https://www.dur.ac.uk/nick.white/tools/ (linked to from the page Sven recommended). It's C, 'cos C is cool ;) To compile it, make sure the imagemagick library is installed, download a copy of libutf from http://lubutu.com/code/utf8-library, into a directory called 'libutf', and use this command: cc `pkg-config --cflags --libs MagickWand` libutf.c lazytrain.c -o lazytrain Then you can run it like this: ./lazytrain textfile.txt DejaVu-Serif-Book 1.png 1.box Let me know how it goes, Nick P.S. Don't run regular programs with 'sudo', if they happen to misbehave, it can cause many more Bad Things to happen (and remember, all programs have bugs [except mine].) On Thu, Sep 13, 2012 at 09:34:09AM -0500, Sven Pedersen wrote: > Try this more updated training system from Nick: > https://www.dur.ac.uk/nick.white/grctraining/ > --Sven > > On Wed, Sep 12, 2012 at 11:36 PM, Donaldo <[email protected]> wrote: > > > > I have installed Tesseract Trainer from > > https://github.com/BaltoRouberol/TesseractTrainer > > I ran it on a text file that I have generated: > > sudo python __main__.py -l epo -t ../epo.calib.txt -F > > ../font/DejaVuSerif.ttf -n dejavuserif -v > > > > It gave lots of errors like this: > > > > FAIL! > > > > APPLY_BOXES: boxfile line 832/@ ((265,87),(290,58)): FAILURE! Couldn't find > > a matching blob > > > > APPLY_BOXES: Unlabelled word at :Bounding box=(151,34)->(258,53) > > > > It generated a tif file which looks good, and a box file which starts thus: > > > > 20 580 35 551 0 > > T 35 580 52 551 0 > > e 52 580 67 551 0 > > k 67 580 83 551 0 > > s 83 580 96 551 0 > > t 96 580 106 551 0 > > o 106 580 121 551 0 > > e 129 580 144 551 0 > > n 144 580 160 551 0 > > > > What should I do next? What is upsetting the proram? > > > > Donald > > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

