Re: [tesseract-ocr] Tesseract Performance

2021-01-01 Thread Shree Devi Kumar
Soumik, I have uploaded the bash scripts and the generated reports and graphs to `ben` branch in my fork of tesstrain repo. See https://github.com/Shreeshrii/tesstrain/tree/ben and https://github.com/Shreeshrii/tesstrain/commit/a6474ef2dbbac47803d13b6f92fdcf8c9dc3107b Results for the validation

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread Keith M
Alex, Thanks for replying, appreciate the time. Especially the command line with various options specified! I've spent hours and hours googling both before posting here, and afterwards. There's SOME information out there, but no real smoking gun. Most of the ideas in the first 10 pages of g

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread Keith M
Ger, Thanks for taking the time to reply. On 1/1/2021 4:00 PM, Ger Hobbelt wrote: Another technique specifically for dot-matrix might be to blend multiple copies of the scan at small offsets. The idea here is that back in the old days of dot matrix, a few DTP applications had printing modes

[tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread shree
Please see old thread at https://groups.google.com/g/tesseract-ocr/c/ApM_TqwV7aE/m/z5jZV0I0AgAJ for link to a completed project for dot matrix On Monday, December 14, 2020 at 12:11:00 PM UTC+5:30 Keith M wrote: > Hi there, > > I've been circling a problem with OCR'ing 90-pages of 30 year old BA

Re: [tesseract-ocr] Re: advice for OCR'ing 9-pin dot matrix BASIC code

2021-01-01 Thread Ger Hobbelt
Another technique specifically for dot-matrix might be to blend multiple copies of the scan at small offsets. The idea here is that back in the old days of dot matrix, a few DTP applications had printing modes which would print dot patterns several times on the same line, but ever so slightly offse

Re: [tesseract-ocr] Tesseract Performance

2021-01-01 Thread Shree Devi Kumar
nohup make MODEL_NAME=ben START_MODEL=ben LANG_TYPE=Indic GROUND_TRUTH_DIR=data/ben-ground-truth TESSDATA=$HOME/tessdata_best DEBUG_INTERVAL=-1 training MAX_ITERATIONS=5 >> data/ben.log & Graphs are created using the training log file as well as validation log files. Some of these require usi