On 25 May 2010 04:04, Arno Teigseth <[email protected]> wrote:
> On Tue, 2010-05-25 at 03:52 +0100, Jimmy O'Regan wrote:
>> Y'know, I don't recall seeing in this exchange anything along the
>> lines of 'I asked for commit access but was refused/got no
>> answer/etc.'
>
> Hehe well no I didn't ask. I don't know exactly how additions and
> commits are related - do you suggest code additions to the mailing list,
> and if you're lucky they're implemented?
>

Well, at the moment, if you submitted a patch with an issue, most of
those have been committed now.

>> You could always use the tools that come with hunspell to export the
>> dictionary as a full form list of words, and run that through
>> wordlist2dawg, which would give you the benefit of hunspell's larger
>> wordlist without the slowdown.
>
> Excellent. But am I correct that a large wordlist slows down the
> tesseract? My little Kichwa dictionary now contains about 15000 words.

Not that I've noticed; I have heard mention that there was a problem
with wordlists containing more than a million word forms and always
stuck below that limit, but it happily flies through dawgs with
800,000 words.


-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to