subject:"Improving accuracy on Tesseract 3.0 $also Issue 265$"

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-08-01 Thread Jimmy O'Regan

2010/8/1 Zdenko Podobný : > > Dňa 28.07.2010 17:02, Jimmy O'Regan wrote / napísal(a): >> > I grepped the code and it seems to be looking for something called > LANG.user-words, but that didn't seem to do anything -- I got the same > garbled text when I ran Tesseract 3 the second time. >

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-08-01 Thread Zdenko Podobný

Dňa 28.07.2010 17:02, Jimmy O'Regan wrote / napísal(a): > I grepped the code and it seems to be looking for something called LANG.user-words, but that didn't seem to do anything -- I got the same garbled text when I ran Tesseract 3 the second time. >> Turns out T3 does

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-28 Thread Jimmy O'Regan

On 27 July 2010 20:49, Philip Pemberton wrote: > On 27/07/10 17:30, Jimmy O'Regan wrote: >>> >>> The Ubuntu wordlist is pretty big... 921 user-added words... >> >> As wordlists go, that's tiny :) > > Aye, but it's an exceptions list :) > Seems to contain a lot of fairly technical words and abbrevi

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-28 Thread Philip Pemberton

On 27/07/10 17:30, Jimmy O'Regan wrote: The Ubuntu wordlist is pretty big... 921 user-added words... As wordlists go, that's tiny :) Aye, but it's an exceptions list :) Seems to contain a lot of fairly technical words and abbreviations which I assume aren't in the Tesseract base wordlist.

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Jimmy O'Regan

On 27 July 2010 13:35, Philip Pemberton wrote: > On 27/07/10 12:38, Jimmy O'Regan wrote: >>> At the risk of sounding like an idiot... how do you do that? >>> I didn't see anything about a user dictionary in the documentation... >>> >> It's a plain text file, one word per line, eng.user-words > > A

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Philip Pemberton

On 27/07/10 12:38, Jimmy O'Regan wrote: >> At the risk of sounding like an idiot... how do you do that? >> I didn't see anything about a user dictionary in the documentation... >> > It's a plain text file, one word per line, eng.user-words Ah, there it is. I can see it in the Ubuntu 10.04 package

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Jimmy O'Regan

On 27 July 2010 11:28, Philip Pemberton wrote: > On 27/07/10 09:57, Jimmy O'Regan wrote: >> Have you tried adding 'MHz' to the user dictionary? > > At the risk of sounding like an idiot... how do you do that? > I didn't see anything about a user dictionary in the documentation... > It's a plain t

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Philip Pemberton

On 27/07/10 09:57, Jimmy O'Regan wrote: > Have you tried adding 'MHz' to the user dictionary? At the risk of sounding like an idiot... how do you do that? I didn't see anything about a user dictionary in the documentation... >> - The top line of text sometimes gets garbled (as in, read as rand

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Jimmy O'Regan

On 26 July 2010 19:21, Philip Pemberton wrote: > Problem is, Tesseract 2.04 doesn't like quoted text: > > phil...@cheetah:~/elektor$ tesseract elek0002.tif elek0002_tess2 > Tesseract Open Source OCR Engine > tesseract: unicharset.cpp:76: const UNICHAR_ID > UNICHARSET::unichar_to_id(const char*, in

Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-26 Thread Philip Pemberton

Hi, I'm currently working on cataloguing about 20 years worth of electronics magazines, books and journals, down to article level. Obviously, typing in the article names, page numbers and synopses isn't an option -- for a start it'd make my hands hurt (a lot!) and take a very long time... we'

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

Improving accuracy on Tesseract 3.0 (also Issue 265)

10 matches

Site Navigation

Mail list logo

Footer information