On 25 May 2010 04:04, Arno Teigseth <[email protected]> wrote: > On Tue, 2010-05-25 at 03:52 +0100, Jimmy O'Regan wrote: >> Y'know, I don't recall seeing in this exchange anything along the >> lines of 'I asked for commit access but was refused/got no >> answer/etc.' > > Hehe well no I didn't ask. I don't know exactly how additions and > commits are related - do you suggest code additions to the mailing list, > and if you're lucky they're implemented? >
Well, at the moment, if you submitted a patch with an issue, most of those have been committed now. >> You could always use the tools that come with hunspell to export the >> dictionary as a full form list of words, and run that through >> wordlist2dawg, which would give you the benefit of hunspell's larger >> wordlist without the slowdown. > > Excellent. But am I correct that a large wordlist slows down the > tesseract? My little Kichwa dictionary now contains about 15000 words. Not that I've noticed; I have heard mention that there was a problem with wordlists containing more than a million word forms and always stuck below that limit, but it happily flies through dawgs with 800,000 words. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

