I couldn't find there anything to do with improving words detection. Am I missing something?
On Wednesday, July 1, 2020 at 12:45:36 PM UTC+3, zdenop wrote: > > Try this: > > https://github.com/Sintun/PersonalHelperPrograms/blob/master/Tesseract/tess.cpp > > Longer story: > https://github.com/tesseract-ocr/tesseract/issues/1714 > > Zdenko > > > st 1. 7. 2020 o 10:29 amit...@gmail.com <ami...@gmail.com <javascript:>> > napĂsal(a): > >> I want to optimise tesseract 4 (lstm) for a set of documents I have. >> I managed to improve its character recognition using the documentation in >> https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00. >> >> However, some words are not just detected. usually words inside tables. >> Even using --psm 6, some are missed. >> >> Is there a way to train the layout/segmentation/word detection engine and >> not just the character recognition? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/e210bfe2-563a-48a5-b0bc-5363c7269bcfn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/e210bfe2-563a-48a5-b0bc-5363c7269bcfn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e4b9d33c-efc9-4a36-8c85-d2a14e4c1692o%40googlegroups.com.