After some research, let me reply my own questions regarding methods:

@ http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf,
general flow chart is explained thoroughly, with method names.
However, I don't seem to be able to reach referenced documents, which
is a shame at my end.

However, I'm still concerned on using Tesseract-OCR with my preferred
settings (first part of the initial post).

Thanks,
Cihan

On Sep 29, 10:59 am, Calomer <calo...@gmail.com> wrote:
> I understand that you can edit box files with any editor (even text
> editor) and check it. Been there, done that, awesome feature.
>
> I'm curious if it is possible to feed tesseract predefined boxes for
> it to just use OCR inside ? I'll make sure that all the boxes have
> only one character inside, promise. Or should I just remove everything
> else, move character regions a little apart etc. to fool the system ?
> That would be really heavy load, would rather just provide boxes to
> the system.
>
> Also, I'm curious for the tesseract methods on different parts. If
> they are not confidential, of course.
>
> 1-) Image Enhancement
> Do you use any image enhancement ? Contrast enhancement ? Histogram
> equalization ? Anything ?
>
> 2-) Text Detection (as in finding lines / words / candidates)
> Does it use edge detectors? Which one(s)? Does it use basic dynamic
> thresholding with a mask? How does it determine mask sizes?
>
> 3-) Character Segmentation / Validation
> Does it use connected component analysis? Projection profile?
> Something else?
>
> 4-) Character Recognition.
> k-means to get cluster centroids ? knn ? svm ? mlp?
>
> 5-) Word Validation
> Don't suppose you are using a dictionary, yet. So don't think this
> part is in the tesseract. Therefore, if you could return best two
> matches (if they have at least %60 of the distance to respective
> centroids) for me to use validation on dictionary.
>
> If you could explain any tiny part, I'd really appreciate it.
>
> Cheers,
> Cihan

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to