On Tue, May 22, 2012 at 05:21:23AM -0700, Galt wrote:
> On May 21, 2:04 am, Nick White <nick.wh...@durham.ac.uk> wrote:
> > I've been suffering a very similar problem with some of the text I'm
> > training, which has several diacritics above and below glyphs. It
> > isn't infrequent to find quite a few lines of garbage which are some
> > of the diacritics taking a line, which then causes the following and
> > preceding lines to not include said diacritics.
> >
> > ...
> >
> > How did you fix the problem in your case?
> 
> My hack for fixing some mis-interpreted high curly quotes
> was to lower the troubled ones with gimp by 10 to 14px
> until Tess started parsing the lines correctly. Luckily I only
> had to do this about a 6 times on 4 different pages for my book.

OK, thanks for the explanation. Pity there was no "magic bullet"
solution ;)

I think my best bet is further improving my box/tif stuff, with the
expectation that this will improve tesseract's chance of detecting
lines properly.

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to