On May 24, 1:46 pm, Lars Aronsson <[email protected]> wrote: > Peter Alberti wrote: > > I've trained tesseract r319 (3.0) to support Danish texts written in > > fraktur. It is not > > perfect but good enough that I hope it may be useful to others. > > This is great! The file dan-frak.traineddata is a binary file. > Tesseract is an open source software. Is there some > documentation for this file format, so I can read and > understand what's in there? I want to keep the part > that is about fraktur/blackletter and substitute the > part that is about Danish pre 1870 spelling for > something based on my Swedish dictionaries. > If Jimmy's method for extracting components doesn't work for you, let me know and I'll try and put my input files together and post them.
I just think I have to add that making a Swedish version of it will probably be a bit more cumbersome than it might seem at first as I haven't included enough fraktur letters for your purposes (there's no ä, ö or å). Best regards, Peter -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

