***** from a post on the mailing list sikuli-dev by macs Is the latest Sikuli migrated to tesseract3? I see a branch name as tesseract3 in git hub. I see many issues regarding OCR being discussed in launchpad.
In my understanding OCR results can be improved by pre-processing of images 1. Convert image to gray scale. 2. Improve contrast or apply edge detection filters. 3. inverting colors or negative 4. Reducing the color depth. 5. Apply image smoothing filters. All filters may not be applicable for all types of images. User might want to improve a filter or a combination of filter to achieve better results. Can we give this option to user? I was not sure if any of the pre processing was done in the RC2 release. I tried to modify the function "doFind(PSC ptn)" in region.java to convert image to grayscale before OCR processing. But I could not see any improvement in OCR. I did not try further because my eclipse environment is not setup completely. Does Sikuli do any pre-processing of image before calling the OCR? It would be nice if you can have the following support for OCR in Sikuli 1. Option for user to select language (Already requested) 2. Tesseract supports training and creation of box files. We should have a option to select user trained files. 3. There are many commercial OCR tools which has higher accuracy and better support for other languages. If the Sikuli OCR design can be modular (as defined in blueprint), user should be able to use other OCR. Other observations in the current OCR 1. The OCR can recognize the text but the click fails. If a screen has text "Search" and if I try click("Search") the click returns failure. But when I try to get the text in the screen using the text() api and print the text, it will print all the strings including the string "Search". May be I think we need some improvement in searching the string of text returned by OCR. -- You received this bug notification because you are a member of Sikuli Drivers, which is subscribed to Sikuli. https://bugs.launchpad.net/bugs/710586 Title: X 1.0rc3: Region.text() -- known problems and needed improvements Status in Sikuli: In Progress Bug description: ******* this report is a summary of known problems and feature requests The text recognition feature (OCR - Region.text()) together with the possibility to find text in an image is still experimental and under developement. This are currently reported bugs: bug 777660: text recognition errors with some fonts bug 783082: [request] want font parameters for text recognition bug 735434: Text extraction from Images fails in some cases on colored backgrounds bug 695616: Inconsistency in text recognition and matching, especially with integers-as-text! bug 695650: find(text).text() does not return same text bug 701005: text() always returns text with trailing x'200A20' bug 701012: text() does not return all intervening blanks, add's others bug 795391: [request] OCR/tesseract: allow new training sets for other languages and more tesseract features Other experienced oddities -- there are problems with text, that is not in english language -- very small and very large fonts may not work -- multiline text makes problems -- intervening/preceding/trailing grafics and symbols are tried to be interpreted as text Tip when using Region.text(): Currently you get the best results, when the region represents only one line of text and only contains text (no graphics/symbols) in english language. If you can influence it: make the text as large as possible. -- additional information: Internally the tesseract OCR engine (http://code.google.com/p/tesseract-ocr/) is used. So their restrictions apply (e.g. minimum size of font, ...). Information can be found on their Wiki. To manage notifications about this bug go to: https://bugs.launchpad.net/sikuli/+bug/710586/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~sikuli-driver Post to : sikuli-driver@lists.launchpad.net Unsubscribe : https://launchpad.net/~sikuli-driver More help : https://help.launchpad.net/ListHelp