After some research, let me reply my own questions regarding methods: @ http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseracticdar2007.pdf, general flow chart is explained thoroughly, with method names. However, I don't seem to be able to reach referenced documents, which is a shame at my end.
However, I'm still concerned on using Tesseract-OCR with my preferred settings (first part of the initial post). Thanks, Cihan On Sep 29, 10:59 am, Calomer <calo...@gmail.com> wrote: > I understand that you can edit box files with any editor (even text > editor) and check it. Been there, done that, awesome feature. > > I'm curious if it is possible to feed tesseract predefined boxes for > it to just use OCR inside ? I'll make sure that all the boxes have > only one character inside, promise. Or should I just remove everything > else, move character regions a little apart etc. to fool the system ? > That would be really heavy load, would rather just provide boxes to > the system. > > Also, I'm curious for the tesseract methods on different parts. If > they are not confidential, of course. > > 1-) Image Enhancement > Do you use any image enhancement ? Contrast enhancement ? Histogram > equalization ? Anything ? > > 2-) Text Detection (as in finding lines / words / candidates) > Does it use edge detectors? Which one(s)? Does it use basic dynamic > thresholding with a mask? How does it determine mask sizes? > > 3-) Character Segmentation / Validation > Does it use connected component analysis? Projection profile? > Something else? > > 4-) Character Recognition. > k-means to get cluster centroids ? knn ? svm ? mlp? > > 5-) Word Validation > Don't suppose you are using a dictionary, yet. So don't think this > part is in the tesseract. Therefore, if you could return best two > matches (if they have at least %60 of the distance to respective > centroids) for me to use validation on dictionary. > > If you could explain any tiny part, I'd really appreciate it. > > Cheers, > Cihan -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en