I don't know if it's affordable for you, but imho decent results can
only be achieved if you do segmentation yourself and then pass image
fragments to Tesseract on a word-by-word basis. Problems may appear
when you have words that are too short, however, as I can see, it's
not your case.

Long time ago, I had started my project relying on Tess's segmentation
and struggled much with it, until I came to a word-by-word approach.
Finally, I even switched to the character-wise recognition which at
last produces decent results. Mostly this transition was caused by
specifics of input images I'm working on (photos, usually of low
quality), but I think this is almost required for ideally scanned
images too.

There are some fruitful math ideas behind Tess's segmentation, but I
think the current implementation is not mature enough to be used
extensively in the production mode.

Warm regards,
Dmitry Silaev





On Thu, Feb 24, 2011 at 1:05 PM, Jose <diox...@gmail.com> wrote:
> Hi, (as you now Saurabh because we talked in private the other day) I tried
> the PSM_SINGLE_COLUMN and the accuracy drops dramatically... I can't afford
> to loose that accuracy. Is it possible to change the way the output is
> display? Looking a the code it seems rather hard to change it... perhaps I
> could print the pos x,y of the word found and then I could work out the
> horizontal/vertial layout? What are your thoughts? regards
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to