On May 24, 1:46 pm, Lars Aronsson <[email protected]> wrote:
> Peter Alberti wrote:
> > I've trained tesseract r319 (3.0) to support Danish texts written in 
> > fraktur. It is not
> > perfect but good enough that I hope it may be useful to others.
>
> This is great! The file dan-frak.traineddata is a binary file.
> Tesseract is an open source software. Is there some
> documentation for this file format, so I can read and
> understand what's in there? I want to keep the part
> that is about fraktur/blackletter and substitute the
> part that is about Danish pre 1870 spelling for
> something based on my Swedish dictionaries.
>
If Jimmy's method for extracting components doesn't work for you, let
me know and I'll try and put my input files together and post them.

I just think I have to add that making a Swedish version of it will
probably be a bit more cumbersome than it might seem at first as I
haven't included enough fraktur letters for your purposes (there's no
ä, ö or å).

Best regards,
Peter

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to