2010/8/1 Zdenko Podobný :
>
> Dňa 28.07.2010 17:02, Jimmy O'Regan wrote / napísal(a):
>>
> I grepped the code and it seems to be looking for something called
> LANG.user-words, but that didn't seem to do anything -- I got the same
> garbled text when I ran Tesseract 3 the second time.
>
Dňa 28.07.2010 17:02, Jimmy O'Regan wrote / napísal(a):
>
I grepped the code and it seems to be looking for something called
LANG.user-words, but that didn't seem to do anything -- I got the same
garbled text when I ran Tesseract 3 the second time.
>> Turns out T3 does
On 27 July 2010 20:49, Philip Pemberton wrote:
> On 27/07/10 17:30, Jimmy O'Regan wrote:
>>>
>>> The Ubuntu wordlist is pretty big... 921 user-added words...
>>
>> As wordlists go, that's tiny :)
>
> Aye, but it's an exceptions list :)
> Seems to contain a lot of fairly technical words and abbrevi
On 27/07/10 17:30, Jimmy O'Regan wrote:
The Ubuntu wordlist is pretty big... 921 user-added words...
As wordlists go, that's tiny :)
Aye, but it's an exceptions list :)
Seems to contain a lot of fairly technical words and abbreviations which
I assume aren't in the Tesseract base wordlist.
On 27 July 2010 13:35, Philip Pemberton wrote:
> On 27/07/10 12:38, Jimmy O'Regan wrote:
>>> At the risk of sounding like an idiot... how do you do that?
>>> I didn't see anything about a user dictionary in the documentation...
>>>
>> It's a plain text file, one word per line, eng.user-words
>
> A
On 27/07/10 12:38, Jimmy O'Regan wrote:
>> At the risk of sounding like an idiot... how do you do that?
>> I didn't see anything about a user dictionary in the documentation...
>>
> It's a plain text file, one word per line, eng.user-words
Ah, there it is. I can see it in the Ubuntu 10.04 package
On 27 July 2010 11:28, Philip Pemberton wrote:
> On 27/07/10 09:57, Jimmy O'Regan wrote:
>> Have you tried adding 'MHz' to the user dictionary?
>
> At the risk of sounding like an idiot... how do you do that?
> I didn't see anything about a user dictionary in the documentation...
>
It's a plain t
On 27/07/10 09:57, Jimmy O'Regan wrote:
> Have you tried adding 'MHz' to the user dictionary?
At the risk of sounding like an idiot... how do you do that?
I didn't see anything about a user dictionary in the documentation...
>> - The top line of text sometimes gets garbled (as in, read as rand
On 26 July 2010 19:21, Philip Pemberton wrote:
> Problem is, Tesseract 2.04 doesn't like quoted text:
>
> phil...@cheetah:~/elektor$ tesseract elek0002.tif elek0002_tess2
> Tesseract Open Source OCR Engine
> tesseract: unicharset.cpp:76: const UNICHAR_ID
> UNICHARSET::unichar_to_id(const char*, in
Hi,
I'm currently working on cataloguing about 20 years worth of electronics
magazines, books and journals, down to article level. Obviously, typing
in the article names, page numbers and synopses isn't an option -- for a
start it'd make my hands hurt (a lot!) and take a very long time...
we'
10 matches
Mail list logo