https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00#training-just-a-few-layers
On Sat, Feb 1, 2020 at 11:33 AM manu pranay <pranaymanu3...@gmail.com> wrote: > Thank you so much for your help shree. > the links you provided were very helpful for me. > > now i am trying to train lstm training with retraining the top layer. > can you please provide me with the commands for retraining top layer . > > thank you very much. > > > On Tue, Jan 28, 2020 at 12:36 PM Shree Devi Kumar <shreesh...@gmail.com> > wrote: > >> Please see https://github.com/Shreeshrii/tesstrain-ckb It uses a >> modified training text based on what you sent and earlier text that I had >> from Pewan and other corpora. >> >> Currently the training data includes >> * AWN 0-9 >> * AEN - ARabic numbers >> * No Persian numbers since some shapes are similar to Arabic Numbers >> >> Fonts do not include those which convert 0-9 to either Arabic or Persian >> numbers. >> >> The replace layer training is still ongoing. The eval results look much >> better than the official ara or script/Arabic, however I do not have any >> real world images for testing. >> >> ArialArial BoldTahomaTahoma Bold >> tessdata_fast/ara Accuracy 62.74 63.49 61.56 61.71 >> tessdata_fast/ara Basic Arabic 95.68 95.22 95.76 94.10 >> tessdata_fast/ara Arabic Extended 0.31 1.13 0.41 1.32 >> tessdata_fast/script/Arabic Accuracy 80.99 80.83 83.02 77.17 >> tessdata_fast/script/Arabic Basic Arabic 96.68 96.34 96.05 93.87 >> tessdata_fast/script/Arabic Arabic Extended 57.20 58.23 63.76 54.72 >> ckbLayer_1.661_152089_296500 >> ckbLayer_fast Accuracy 98.20 97.78 98.06 96.13 >> ckbLayer_fast Basic Arabic 99.10 99.15 98.54 98.44 >> ckbLayer_fast Arabic Extended 98.30 98.70 99.10 96.27 >> >> >> On Mon, Jan 13, 2020 at 7:17 PM Ayub Rauf wrote: >> >>> Hi, >>> I attached full training text with forbidden_characters in it. >>> really both of number types will be used and I see two type numbers >>> written in books but Kurdish institute verified that Arabic numbers will be >>> used from now on. Persian numbers written by Iranian Kurds and Arabic >>> number used by Iraqi Kurds but as I said numbers in ckb should be >>> written by Arabic type, but we have to recognize two type in OCR. >>> just like two types of "ك" and "ک" that written in books but now we only >>> use "ک". >>> I think these similarities won't into problem after that we can correct >>> letters in a spell checker. >>> As I said before Arial and Tahoma fonts are the most used fonts books >>> written by. >>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWm%3DXQaxBergf5-OUE-C8jB3u12dSOPUPchRZT4w21Z-g%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWm%3DXQaxBergf5-OUE-C8jB3u12dSOPUPchRZT4w21Z-g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAOt%3D%2B%3Dbip7ehaT3VWcSoHN4HX5eP8Lmoe7tgdPcYoBLywrbuEA%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAOt%3D%2B%3Dbip7ehaT3VWcSoHN4HX5eP8Lmoe7tgdPcYoBLywrbuEA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVsTZAQfpjDJhWtyQsYDuchcaQ9tyk0TfTqFejQEc2vXA%40mail.gmail.com.