Re: Accuracy worse on 3.0-svn than 2.04?

2010-07-27 Thread Jimmy O'Regan
On 27 July 2010 21:04, patrickq wrote: > Keep in mind that accuracy depends heavily on the right fonts being > included in the training set. I have no reason to believe that the > 2.04 and 3.0 training sets are identical - perhaps someone could > enlighten us. There is mention in one of the Tesse

Re: mis-decoding a single line of text

2010-07-27 Thread Jimmy O'Regan
On 27 July 2010 21:55, patrickq wrote: > I assume you are referring to > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract > ? > It's helpful, thanks, and I should have checked what's there first. > > My understanding is that: > - one dictionary file (eng.word-dawg) is included as par

Re: mis-decoding a single line of text

2010-07-27 Thread Jimmy O'Regan
On 27 July 2010 20:59, patrickq wrote: > I get HAX 6 5-5,- with Tesseract 3.0 > > What I find remarkable is that half the folks on this forum would love > to disable the word recognition (i.e. dictionary), the other half > would like to enable it - and absolutely no one knows how to enable/ > disa

Re: mis-decoding a single line of text

2010-07-27 Thread patrickq
I assume you are referring to http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract ? It's helpful, thanks, and I should have checked what's there first. My understanding is that: - one dictionary file (eng.word-dawg) is included as part of building the training data, and includes a separ

Re: mis-decoding a single line of text

2010-07-27 Thread Eugene Reimer
A quick glance at the documentation will tell you that "the dictionary" lives in several DAWG files, as well in that user-words file. patrickq wrote, On 2010-07-27 14:59: I get HAX 6 5-5,- with Tesseract 3.0 What I find remarkable is that half the folks on this forum would love to disable the

Re: Accuracy worse on 3.0-svn than 2.04?

2010-07-27 Thread patrickq
Keep in mind that accuracy depends heavily on the right fonts being included in the training set. I have no reason to believe that the 2.04 and 3.0 training sets are identical - perhaps someone could enlighten us. In any case, I routinely come accross certain pages where recognition is terrible and

Re: mis-decoding a single line of text

2010-07-27 Thread patrickq
I get HAX 6 5-5,- with Tesseract 3.0 What I find remarkable is that half the folks on this forum would love to disable the word recognition (i.e. dictionary), the other half would like to enable it - and absolutely no one knows how to enable/ disable the dictionary nor can say for sure if it's act

Re: mis-decoding a single line of text

2010-07-27 Thread Jimmy O'Regan
On 27 July 2010 19:45, khoshteep wrote: > It seems like Tesseract is designed for word recognition and not > character recognition. Correct. -- jimregan, that's because deep inside you, you are evil. Also not-so-deep inside you. -- You received this message because you are subscribed to the

mis-decoding a single line of text

2010-07-27 Thread khoshteep
hi everyone, I am trying to decode a single line of text that is a bit noisy. Link to uploaded image is attached. The text is "MAX665," but what I'm getting back is "THAI 6 8-51-". http://tesseract-ocr.googlegroups.com/web/row1.bmp?gda=iO7ypToAAABJaCRJGWfX_qPCIQ7C4NkPrfoTOVd7wlGlVfd1g07AArmU4fy-m

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Jimmy O'Regan
On 27 July 2010 13:35, Philip Pemberton wrote: > On 27/07/10 12:38, Jimmy O'Regan wrote: >>> At the risk of sounding like an idiot... how do you do that? >>> I didn't see anything about a user dictionary in the documentation... >>> >> It's a plain text file, one word per line, eng.user-words > > A

Re: Meter Reading with tesseract

2010-07-27 Thread Jimmy O'Regan
On 27 July 2010 15:00, erwin.cloosterm...@gmail.com wrote: > Is Tesseract suited to extract a meter reading from a picture of a > gas, electricity or water meter ? > No more than it is to reading licence plates, but people have tried anyway - whether or not they're successful, I can't say (the li

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Philip Pemberton
On 27/07/10 12:38, Jimmy O'Regan wrote: >> At the risk of sounding like an idiot... how do you do that? >> I didn't see anything about a user dictionary in the documentation... >> > It's a plain text file, one word per line, eng.user-words Ah, there it is. I can see it in the Ubuntu 10.04 package

meter reading with tesseract... and Google PowerMeter

2010-07-27 Thread erwin.cloosterm...@gmail.com
additional idea: it might be usefull to integrate tesseract with google powermeter. Or to have an app that extracts the necessary data from a picture of a meter to fill the fields of the PowerMeter API -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" g

Meter Reading with tesseract

2010-07-27 Thread erwin.cloosterm...@gmail.com
Is Tesseract suited to extract a meter reading from a picture of a gas, electricity or water meter ? It would be nice to be able to take a picture of your meters with your smartphone and then upload the results to a spreadsheet or to your company's website. Now I have the possibility to manually e

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Jimmy O'Regan
On 27 July 2010 11:28, Philip Pemberton wrote: > On 27/07/10 09:57, Jimmy O'Regan wrote: >> Have you tried adding 'MHz' to the user dictionary? > > At the risk of sounding like an idiot... how do you do that? > I didn't see anything about a user dictionary in the documentation... > It's a plain t

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Philip Pemberton
On 27/07/10 09:57, Jimmy O'Regan wrote: > Have you tried adding 'MHz' to the user dictionary? At the risk of sounding like an idiot... how do you do that? I didn't see anything about a user dictionary in the documentation... >> - The top line of text sometimes gets garbled (as in, read as rand

Re: Improving accuracy on Tesseract 3.0 (also Issue 265)

2010-07-27 Thread Jimmy O'Regan
On 26 July 2010 19:21, Philip Pemberton wrote: > Problem is, Tesseract 2.04 doesn't like quoted text: > > phil...@cheetah:~/elektor$ tesseract elek0002.tif elek0002_tess2 > Tesseract Open Source OCR Engine > tesseract: unicharset.cpp:76: const UNICHAR_ID > UNICHARSET::unichar_to_id(const char*, in

Re: how to disable premuter

2010-07-27 Thread Jimmy O'Regan
On 26 July 2010 22:29, khoshteep wrote: > Hi all, > I'm using version 2.04 and want to get  the result right after > classification. Permuter changes the characters to something which is > completely different from what is printed. So does anyone know how to > disable the Premuter? This type of q