I'm using a loop around "tesseract $X $X batch.nochop makebox" to produce box files to be corrected and re-used for training, and have two questions.
Is there a way to make it produce the line-by-line format (rather than character-by-character) that newer versions of tesseract support as training data? (I'm using tesseract 4.0.0 in a docker container.) I have a TSV file (which I could transform into some other format) with the correct string for the text in each image file, but it does not have the pixel locations. Is there any way to tell tesseract makebox to use those strings and "make them fit" the image? Thanks, Adam -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c6c81f79-3747-475e-a95d-6957e846098cn%40googlegroups.com.