Tesseract support uzn file[1] with psm 4. Seach forum for more details [1] https://github.com/OpenGreekAndLatin/greek-dev/wiki/uzn-format
Zdenko pi 23. 9. 2022 o 17:20 Vincent Sarbach-Pulicani <[email protected]> napĂsal(a): > Hello, > I'm working on historical newspaper from the interwar period written in 3 > different languages : corsican, french and italian. > After many tries, Tesseract seems to be the best OCR for me but the layout > analysis of a newspaper is complex. > However, using the API of Gallica (French national library), I can have > access to an OCR (bad quality) and usable ALTO files. > My question is : can I use those ALTO files to make Tesseract follow the > same segmentation as the basic OCR? > I don't know if my question makes sense. > Thanks a lot, > Vincent Sarbach-Pulicani > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/334be2c9-a194-46ee-adcb-ab48b712e3b8n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/334be2c9-a194-46ee-adcb-ab48b712e3b8n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z22bwiE2JEsq4kHn9xoFTsMw%2BdyS70pO9aS4%2BwaO%2BOaw%40mail.gmail.com.

