If the original model lacks the ∠ symbol, fine tuning is not going to add it for you. We have all went through that process. To introduce a new character, removing the top layer and train from there is the most effective approach.
On Thursday, November 23, 2023 at 12:15:56 PM UTC+3 smon...@gmail.com wrote: > If I need to train new characters that are not recognized by a default > model, is fine tuning in this case the right approach? > One of these characters ist the one for angularity: ∠ > > This symbols appear in technical drawings and should be recognised in > those. E.g. for the scenario in the following picture tesseract should > reconize this symbol. > > > > [image: angularity.png] > > Also here is one of the pngs I tried to train with: > [image: angularity_0_r0.jpg] > They all look pretty similar to this one. Things that change are the > angle, the propotion and the thickness of the lines. All examples have this > 64x64 pixel box around it. > > > Is Fine Tuning for this scenario the right approach as I only find > information for fine tuning for specific fonts. For fine tune also the > "tesstrain" repository would not be needed as it is used for training from > scratch, correct? > desal...@gmail.com schrieb am Mittwoch, 22. November 2023 um 15:27:02 > UTC+1: > >> From my limited experience, you need a lot more data than that to train >> from scratch. If you can't make more than that data, you might first try to >> fine tune:and then train by removing the top layer of the best model. >> >> On Wednesday, November 22, 2023 at 4:46:53 PM UTC+3 smon...@gmail.com >> wrote: >> >>> As it is not properly possible to combine my traineddata from scratch >>> with an existing one, I have decided to also train my traineddata model >>> numbers. Therefore I wrote a script which synthetically generates >>> groundtruth data with text2image. >>> This script uses dozens of different fonts and creates numbers for the >>> following formats. >>> X.XXX >>> X.XX >>> X,XX >>> X,XXX >>> I generated 10,000 files to train the numbers. But unfortunately numbers >>> get recognized pretty poorly with the best model. (most of times only "0."; >>> "0" or "0," gets recognized) >>> So I wanted to ask if It is not enough training (ground truth data) for >>> proper recognition when I train several fonts. >>> Thanks in advance for you help. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fb4a1b27-db44-49a6-adfa-ada9e13030aan%40googlegroups.com.