Re: Ugly behavior when recognizing – advice requirement

2013-05-23 Thread Dmitri Silaev
Andres, Inherently, Tesseract is designed to detect both straight and inverted text, probably in the same text image. Often this is a source of its confusion with what is the background and what is the foreground: sometimes for closed character interior is treated as a character and foreground pix

Re: Ugly behavior when recognizing – advice requirement

2013-05-20 Thread Andres
Hi Dmitri, Many thanks for your help. I’ve tried with PageSegMode in PSM_SINGLE_BLOCK_VERT_TEXT and surprisingly I got very good results. But then I switched from Tesseract 3.01 to 3.02 (revision 724) and the behavior of tesseract changed significantly, not for good in my case. It began to detec

Re: Ugly behavior when recognizing – advice requirement

2013-05-07 Thread Dmitri Silaev
Andres, Your code seems to be correct. I personally use a few more lines right after the call to GetIterator(): it->Begin(); if(it->IsAtFinalElement(RIL_BLOCK, RIL_SYMBOL)) return; if(!it->IsAtBeginningOf(RIL_SYMBOL)) return; But this shouldn't bother you if you rely on

Re: Ugly behavior when recognizing – advice requirement

2013-05-05 Thread Andres
Answering part of what I asked last, I've found a way of getting the alternatives to each char, but seems to be not working in 3.01 according to what I tested and http://code.google.com/p/tesseract-ocr/issues/detail?id=714 My snippet: #include ... tess_api.SetVariable("save_blob_choices", "T")

Re: Ugly behavior when recognizing – advice requirement

2013-05-05 Thread Andres
Hi Dmitri, Many thanks for your hints, as always. Regarding the links in my previous message, sorry for that, I'll repost the entire message below this message, fixed. I like the method that you tell that you use in CustomOCR. Is there a way of getting the character variants without making a h

Re: Ugly behavior when recognizing – advice requirement

2013-05-03 Thread Dmitri Silaev
Andres, Above all, your first link seem to be pointing to a "traineddata" file instead of an image. Second, without actually diving deep into your problem, I can suggest specifying the single line psm mode in the command line. And finally you can use the user patterns feature to restrict possible

Ugly behavior when recognizing – advice requirement

2013-05-03 Thread Andres
Dear people, I trained Tesseract for my font (FE-Schrift: http://de.wikipedia.org/wiki/FE-Schrift ) and I’m getting very bad results. I am using Tesseract 3.01 under Windows. In this image: https://docs.google.com/file/d/0BxkuvS_LuBAzeFNZUVA1cThLMG8/edit?usp=sharing Where text is SAA5298 I’m ge