On Mon, Apr 16, 2012 at 06:38:01PM +0200, zdenko podobny wrote: > I think in 3.02 will provide solution this cases: you can use more than one > language for OCR. e.g. you can run something like this: > > tesseract image output -l grc+ell
Ah, that's a very good idea, and will indeed be useful. However for my usecase (a script which is mostly the same, but with additions, and an older version of the language), it would be useful to only use one set of dictionary files (rather than presumably the union of grc & ell, in the above example). I wonder if there's any good way of integrating this functionality in to tesseract; I could imagine changing the dictionary files wouldn't be a particularly unusual thing to want to do, as mappings of dictionaries and scripts is not going to be 1:1. As a workaround one could probably unpack the traineddata, remove the dictionary files (and add different ones if appropriate), then repack it. But ideally I think it would be good to be able to specify different dictionary files on the command line (and ideally as UTF-8 word per line files, which were converted into DAWG in memory if needed.) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en