On 27 July 2010 21:04, patrickq wrote:
> Keep in mind that accuracy depends heavily on the right fonts being
> included in the training set. I have no reason to believe that the
> 2.04 and 3.0 training sets are identical - perhaps someone could
> enlighten us.
There is mention in one of the Tesse
On 27 July 2010 21:55, patrickq wrote:
> I assume you are referring to
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
> ?
> It's helpful, thanks, and I should have checked what's there first.
>
> My understanding is that:
> - one dictionary file (eng.word-dawg) is included as par
On 27 July 2010 20:59, patrickq wrote:
> I get HAX 6 5-5,- with Tesseract 3.0
>
> What I find remarkable is that half the folks on this forum would love
> to disable the word recognition (i.e. dictionary), the other half
> would like to enable it - and absolutely no one knows how to enable/
> disa
I assume you are referring to
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
?
It's helpful, thanks, and I should have checked what's there first.
My understanding is that:
- one dictionary file (eng.word-dawg) is included as part of building
the training data, and includes a separ
A quick glance at the documentation will tell you that "the dictionary"
lives in several DAWG files, as well in that user-words file.
patrickq wrote, On 2010-07-27 14:59:
I get HAX 6 5-5,- with Tesseract 3.0
What I find remarkable is that half the folks on this forum would love
to disable the
Keep in mind that accuracy depends heavily on the right fonts being
included in the training set. I have no reason to believe that the
2.04 and 3.0 training sets are identical - perhaps someone could
enlighten us. In any case, I routinely come accross certain pages
where recognition is terrible and
I get HAX 6 5-5,- with Tesseract 3.0
What I find remarkable is that half the folks on this forum would love
to disable the word recognition (i.e. dictionary), the other half
would like to enable it - and absolutely no one knows how to enable/
disable the dictionary nor can say for sure if it's act
On 27 July 2010 19:45, khoshteep wrote:
> It seems like Tesseract is designed for word recognition and not
> character recognition.
Correct.
--
jimregan, that's because deep inside you, you are evil.
Also not-so-deep inside you.
--
You received this message because you are subscribed to the
hi everyone,
I am trying to decode a single line of text that is a bit noisy. Link
to uploaded image is attached. The text is "MAX665," but what I'm
getting back is "THAI 6 8-51-".
http://tesseract-ocr.googlegroups.com/web/row1.bmp?gda=iO7ypToAAABJaCRJGWfX_qPCIQ7C4NkPrfoTOVd7wlGlVfd1g07AArmU4fy-m
On 27 July 2010 13:35, Philip Pemberton wrote:
> On 27/07/10 12:38, Jimmy O'Regan wrote:
>>> At the risk of sounding like an idiot... how do you do that?
>>> I didn't see anything about a user dictionary in the documentation...
>>>
>> It's a plain text file, one word per line, eng.user-words
>
> A
On 27 July 2010 15:00, erwin.cloosterm...@gmail.com
wrote:
> Is Tesseract suited to extract a meter reading from a picture of a
> gas, electricity or water meter ?
>
No more than it is to reading licence plates, but people have tried
anyway - whether or not they're successful, I can't say (the li
On 27/07/10 12:38, Jimmy O'Regan wrote:
>> At the risk of sounding like an idiot... how do you do that?
>> I didn't see anything about a user dictionary in the documentation...
>>
> It's a plain text file, one word per line, eng.user-words
Ah, there it is. I can see it in the Ubuntu 10.04 package
additional idea: it might be usefull to integrate tesseract with
google powermeter. Or to have an app that extracts the necessary data
from a picture of a meter to fill the fields of the PowerMeter API
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" g
Is Tesseract suited to extract a meter reading from a picture of a
gas, electricity or water meter ?
It would be nice to be able to take a picture of your meters with your
smartphone and then upload the results to a spreadsheet or to your
company's website.
Now I have the possibility to manually e
On 27 July 2010 11:28, Philip Pemberton wrote:
> On 27/07/10 09:57, Jimmy O'Regan wrote:
>> Have you tried adding 'MHz' to the user dictionary?
>
> At the risk of sounding like an idiot... how do you do that?
> I didn't see anything about a user dictionary in the documentation...
>
It's a plain t
On 27/07/10 09:57, Jimmy O'Regan wrote:
> Have you tried adding 'MHz' to the user dictionary?
At the risk of sounding like an idiot... how do you do that?
I didn't see anything about a user dictionary in the documentation...
>> - The top line of text sometimes gets garbled (as in, read as rand
On 26 July 2010 19:21, Philip Pemberton wrote:
> Problem is, Tesseract 2.04 doesn't like quoted text:
>
> phil...@cheetah:~/elektor$ tesseract elek0002.tif elek0002_tess2
> Tesseract Open Source OCR Engine
> tesseract: unicharset.cpp:76: const UNICHAR_ID
> UNICHARSET::unichar_to_id(const char*, in
On 26 July 2010 22:29, khoshteep wrote:
> Hi all,
> I'm using version 2.04 and want to get the result right after
> classification. Permuter changes the characters to something which is
> completely different from what is printed. So does anyone know how to
> disable the Premuter?
This type of q
18 matches
Mail list logo