Re: [tesseract-ocr] Help with blurred OCR but "simple text"

2017-04-06 Thread Allistair C
You might want to try preprocessing with a threshold filter (otsu threshold) to harden the edges? Sent from my iPhone > On 6 Apr 2017, at 10:16, Javier Abascal wrote: > > Hi everyone! :) > > I am having troubles identifying correctly the text in the images attached. > In my opinion, they are

Re: [tesseract-ocr] I can't get accurate ocr of this can anyone help with settings?

2017-01-02 Thread Allistair C
The whole point of a captcha is to evade automated reading. That's why letters are very close together and letters are heavily rotated off a consistent baseline. OCR is designed for normal text input so you need to do clever preprocessing here first. Sent from my iPhone > On 2 Jan 2017, at 03:

Re: [tesseract-ocr] Re: Preprocessing ideas besides cropping/resizing/thresholding and identifying individual letters.

2016-12-31 Thread Allistair
Have you tried Google Cloud Vision at all - its OCR seems superior to Tesseract from tests I have done to date. I just span up a project for you and it did pretty well (error on single digit zero which matched \n2 instead of 0) but maybe with some of the preprocessing you are doing it will work be

Re: [tesseract-ocr] ImageMagick convert to tesseract-ocr fail!

2016-12-11 Thread Allistair
It is usually helpful to see the image you are providing to Tesseract. On 11 December 2016 at 14:09, Gianfranco Cecconi wrote: > Hi All, > I am a new tesseract users to forgive me if my question is naive. > > My problem is similar to what is described here >

Re: [tesseract-ocr] Loading user-words from code

2016-12-08 Thread Allistair
e from storing my words decrypted. I can > have decrypted words only in my app code. > > W dniu czwartek, 8 grudnia 2016 09:51:10 UTC+1 użytkownik Allistair C > napisał: >> >> Not sure it can but I wondered whether the scope of your legal regulation >> would allow:

Re: [tesseract-ocr] Loading user-words from code

2016-12-08 Thread Allistair C
Not sure it can but I wondered whether the scope of your legal regulation would allow: 1. Encrypt the user words file or store in your source code 2. In you wrapper program just before tesseract api init decrypt the file to tessdata 3. Init tesseract pointed to this file 4. Perform ocr 5. Delet

Re: [tesseract-ocr] Anybody can help on how to recognize this kind of image?

2016-12-02 Thread Allistair
I did a lot of work experimenting with trying to recognise this kind of text and never got it to a satisfactory level - try Google Cloud Vision. On 1 December 2016 at 23:44, Ni Min wrote: > I have download all eng related training data from github and tried with > the following command but it ca

Re: [tesseract-ocr] Re: wanna know how user-words effects recognition result

2016-11-28 Thread Allistair
ut Google Cloud Vision's text recognition and see how you get on - unsure how good it is at pages of copy like yours but I had great success with natural world scenes that a previous attempt with Tesseract totally failed at. Cheers On 28 November 2016 at 10:39, James Liu wrote: > Hi Alli

Re: [tesseract-ocr] text extraction problem with tesseract for the image

2016-11-24 Thread Allistair
By figure text, so you mean "Figure 1: figure supplement 1 Vera et al."? If so I would do a two-pass approach of cropping out the clearly separated top right figure text, then resizing it to Tesseract-friendly resolution, then OCR it. It worked for me (MacOS, ImageMagick, Tesseract 3.04.01) ...

Re: [tesseract-ocr] Tesseract cannot recognize clean webpage screenshot

2016-11-11 Thread Allistair C
gt;> On Thursday, November 10, 2016 at 1:03:43 PM UTC-8, Allistair C wrote: >> What is it you are trying to achieve exactly? >> >>> On 10 November 2016 at 18:02, JF wrote: >>> I'm using Tesseract (3.04.01 with leptonica-1.73) on Mac OS 10.12 to >>

Re: [tesseract-ocr] Tesseract cannot recognize clean webpage screenshot

2016-11-10 Thread Allistair
What is it you are trying to achieve exactly? On 10 November 2016 at 18:02, JF wrote: > I'm using Tesseract (3.04.01 with leptonica-1.73) on Mac OS 10.12 to > segment a clean screenshot of a web page. > > Here is the command: > > > tesseract screen.png output.txt > > > screen.png: > > > [ima

Re: [tesseract-ocr] Re: DO YOU HAVE ANY IDEA HOW TO IMPROVE MY OUTPUT??!!

2016-09-27 Thread Allistair
You're spot on - Tesseract is not going to invent missing pixels for you nor is dilation/erosion preprocessing. You have to start with some pixels to stand any chance of success and your receipt exhibits as you note many areas of rubbed out characters. Perhaps in some future world there is a way o

Re: [tesseract-ocr] Re: Help in read Blue and White image.

2016-08-19 Thread Allistair
services, use this attached. > > Regards, > Lucas Alexandre > > -Mensagem original- > De: Allistair > Para: "tesseract-ocr@googlegroups.com" > Data: Sexta, 19 de Agosto de 2016 21:12 > Assunto: Re: [tesseract-ocr] Re: Help in read Blue and Whi

Re: [tesseract-ocr] Re: Help in read Blue and White image.

2016-08-19 Thread Allistair
Regards, > Lucas Alexandre > > -----Mensagem original- > De: Allistair C > Para: tesseract-ocr@googlegroups.com > Data: Sexta, 19 de Agosto de 2016 20:45 > Assunto: Re: [tesseract-ocr] Help in read Blue and White image. > > Do you have a sample image? > > S

Re: [tesseract-ocr] Help in read Blue and White image.

2016-08-19 Thread Allistair C
Do you have a sample image? Sent from my iPhone > On 19 Aug 2016, at 20:33, Lucas Alexandre wrote: > > >Hello, > > I am a new member of this mailing list. I am creating a small project to read > electronic screens through OCR. In other words, we set up some equipment that > capture > th

Re: [tesseract-ocr] no output for a simple test image

2016-08-19 Thread Allistair
Your image is not very "linear" in the sense that a word document is and which Tesseract by default does best with. You should try another page segmentation. I tried -psm 6 and got l 6 3 s 2 ‘ 7 9 5 Note this is not 100% accurate so you might then try increasing the resolution of your image

Re: [tesseract-ocr] OCR of 'traffic light' nutrition nformation on food products

2016-07-28 Thread Allistair
Needless to say this is a difficult image. For a start the angle at which the picture is taken is skewed, the plastic is squished on the right. There is god knows how much other text noise in and around the image, and then there's just natural scene noise - edges, shading, lines etc. Tesseract does

Re: [tesseract-ocr] Is this the best I can get out of tesseract ?

2016-07-27 Thread Allistair C
Depends what part of the input image you are interested in? Sent from my iPhone > On 27 Jul 2016, at 16:28, Dorin Bujor wrote: > > > input.jpg > > > > > > > out.txt: > > > Ansamhhll River"s Towers- mnel.na:he@f|deliacasa.m - Fideliacasa Mail - > Goagle chrome > > > > - .fldglrzeasa

Re: [tesseract-ocr] tesseract-ocr poor scan result

2016-07-21 Thread Allistair
Please provide your detailed question, thanx in advance. On 21 July 2016 at 13:53, wrote: > Hi to all, >we are using testract for scanning docs,our issue is getting poor > results inform of more garbase values and inaccurate data result,please > provide your solutions,thanx in advance. > > -

Re: [tesseract-ocr] Can tesseract read shiny metal surfaces?

2016-07-21 Thread Allistair
Tesseract doesn't care about whether things are metal or plastic or cotton, it tries to turn whatever image it is given into text via a pipeline of rules and shape classifiers for text. It succeeds when the text is distinct enough from the background so that it can do that job, and I suspect you wi

Re: [tesseract-ocr] Need OCR SW designed to extract transactions from bank statements to xfer into a General Ledger like QuickBooks or a spreadsheet..

2016-07-20 Thread Allistair C
No idea what the best is but a google search lists a number of providers of such: Google for 'bank statement ocr' You should see results like statement reader and smartex for instance. Cheers Sent from my iPhone > On 20 Jul 2016, at 03:58, Dave Burleigh wrote: > > What is best OCR software

Re: [tesseract-ocr] Re: Help OCR'in an image

2016-07-15 Thread Allistair
Great stuff. My parting advice is don't think it will always be 100% perfect. I hope it will but you could get a weird person name that brings 2 letters together just close enough to make Tesseract get it wrong. I would maybe do further testing against lots of test images - of course it depends on

Re: [tesseract-ocr] Re: Help OCR'in an image

2016-07-15 Thread Allistair
So that sounds like you're running a non-master version (dev) and Tess4J is running latest master 3.04 (https://github.com/tesseract-ocr/tesseract) - as shown in its changelog http://tess4j.sourceforge.net/changelog.html Eradicate the difference and then see if you see different results. On 15

Re: [tesseract-ocr] Re: Help OCR'in an image

2016-07-15 Thread Allistair
Did you find out the versions being used? Tess4J changelog suggests: Recompile Tesseract 3.04.01 DLL against Leptonica 1.73 How does that compare with your CLI? Is any config file or option being injected anywhere? Are you pushing the same page segmentation model param (psm) or using automatic

Re: [tesseract-ocr] Re: Help OCR'in an image

2016-07-14 Thread Allistair
I'm afraid that's about the limit of what I can suggest - there are a great many "engine settings" available that can be tweaked to alter the OCR but they are not very well documented. Perhaps someone more familiar with these kinds of mistakes can try and help. Did the scaling fix the M issue even

Re: [tesseract-ocr] Re: Help OCR'in an image

2016-07-14 Thread Allistair C
Have you tried resizing your image to be larger, try x2 larger - can sometimes help. Is this happening to all Ms or just one? Sent from my iPhone > On 14 Jul 2016, at 03:44, Raphael Budd wrote: > > So I added really strong pre processing that chops up the schedule, however > it is being weird

Re: [tesseract-ocr] wrapper function

2016-07-13 Thread Allistair
http://lmgtfy.com/?q=tesseract+python On 13 July 2016 at 16:49, Mitesh Kalal wrote: > How to create wrapper function using python for page layout and > segmentation? How to give image as an argument? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-

Re: [tesseract-ocr] box outside rectangle and invalid box errors

2016-07-13 Thread Allistair
https://github.com/tesseract-ocr/tesseract/issues/112 On 13 July 2016 at 06:27, Jeremy Reed wrote: > I'm using the latest Cygwin version of tesseract (v3.04.01) and I'm > getting the following errors when I OCR an image: > > Error in boxClipToRectangle: box outside rectangle > Error in pixScanFo

Re: [tesseract-ocr] Help OCR'in an image

2016-07-12 Thread Allistair
e the dividing grey lines as a 2nd text - find and replace all grey pixels with white in this case. If that still does not work then you will need to address the fact borders are getting in the way and do something drastic as I've suggested. On 12 July 2016 at 14:11, Allistair wrote:

Re: [tesseract-ocr] Help OCR'in an image

2016-07-12 Thread Allistair
e rest, I'm guessing that > might also help? > > > Thanks for the help by the way! > > On Tuesday, July 12, 2016 at 5:14:01 AM UTC-4, Allistair C wrote: >> >> In my opinion, given you have a very fixed layout/template this gives you >> more control over

Re: [tesseract-ocr] Help OCR'in an image

2016-07-12 Thread Allistair
In my opinion, given you have a very fixed layout/template this gives you more control over how you perform the OCR. Rather than give Tesseract the entire spreadsheet here why not program a preprocessing stage where you extract the text you want out cleanly into a new image (given you know all (X,

Re: [tesseract-ocr] Help on OCR a set of images

2016-07-09 Thread Allistair
Resize your image to 1000x246 and it's perfect: ➜ ocr tesseract 160709.jpg stdout BANDOLERO XXVI 190101002002455 (1990) On 9 July 2016 at 15:21, Trimbitas Sorin wrote: > Hi, > > I tried to OCR the attached image but I get only garbage: > > User command line: *tesseract ~/Desktop/GetImageText

Re: [tesseract-ocr] thresholding

2016-07-07 Thread Allistair
As I said, use OpenCV first to produce the Otsu-thresholded image - OCV has a C++ library. http://stackoverflow.com/questions/17141535/how-to-use-the-otsu-threshold-in-opencv With the resulting image, provide that to Tesseract programmatically. https://github.com/tesseract-ocr/tesseract/wiki/API

Re: [tesseract-ocr] thresholding

2016-07-06 Thread Allistair
http://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html On 6 July 2016 at 19:58, Allistair C wrote: > Preprocessing with OpenCV before providing to Tesseract. > > Sent from my iPhone > > On 6 Jul 2016, at 13:46, Mitesh Kalal wrote: > > I just started woking

Re: [tesseract-ocr] thresholding

2016-07-06 Thread Allistair C
Preprocessing with OpenCV before providing to Tesseract. Sent from my iPhone > On 6 Jul 2016, at 13:46, Mitesh Kalal wrote: > > I just started woking with tessaract. I am working on thresholding. How to > give input and get output image inn otsu thresholding method? > -- > You received this m

Re: [tesseract-ocr] tesseract-ocr is not identifying the words from the attached file

2016-06-28 Thread Allistair
patient's name then you could easily crop out a top rectangle with just the text before inputting to Tesseract. On 28 June 2016 at 10:51, Allistair wrote: > The font is very unusual - pixellated edges, close together and far too > small. You might want to also try a different page s

Re: [tesseract-ocr] tesseract-ocr is not identifying the words from the attached file

2016-06-28 Thread Allistair
The font is very unusual - pixellated edges, close together and far too small. You might want to also try a different page segmentation parameter since the image contains a big blob of non-text. On 28 June 2016 at 10:44, wrote: > Hello, > > tesseract-ocr is not identifying the words from the att

Re: [tesseract-ocr] Tesseract configuration for alphanumeric strings: mixes up 2, Z, 6 and G

2016-06-27 Thread Allistair
gt; out. > > I really appreciate your help! > > Op maandag 27 juni 2016 10:52:58 UTC+2 schreef Allistair C: >> >> Have you tried the generally useful increasing your image sizes until it >> works approach? Not sure if the samples you posted were the actual size but >&g

Re: [tesseract-ocr] Tesseract configuration for alphanumeric strings: mixes up 2, Z, 6 and G

2016-06-27 Thread Allistair
f Tesseract that 3 might do the trick, but it > didn't. > > Am I doing something wrong? > > > Op zondag 26 juni 2016 22:49:09 UTC+2 schreef Allistair C: >> >> Did you ever look at incorporating the unicharambigs file into your >> training? >> >

Re: [tesseract-ocr] Tesseract configuration for alphanumeric strings: mixes up 2, Z, 6 and G

2016-06-26 Thread Allistair
Did you ever look at incorporating the unicharambigs file into your training? http://www.resolveradiologic.com/blog/2013/01/16/more-on-training-tesseract/ On 26 June 2016 at 15:09, Timothy Korse wrote: > I'm trying to configurate tesseract to recognize *alphanumeric strings* of > 10 characters

Re: [tesseract-ocr] Need to understand Tesseract code

2016-06-16 Thread Allistair C
in c and > C++ it seems even more tougher. > > I did mention my use case is to be able to identify text out of movie posters > printed in newspaper. > Is someone aware of something similar to tesseract which can do this job ? > > Thanks > Ravi Katiyar > >> On

Re: [tesseract-ocr] Need to understand Tesseract code

2016-06-15 Thread Allistair
Hi, Your question is a little difficult to understand - it sounds like you are saying on the one hand you have no OCR or image processing background, know Java, and want to modify Tesseract toward some aim that you do not specify? Tesseract as far as I understand is developed using C/C++ and not

Re: [tesseract-ocr] Tesseract, opencv and SWT (stroke width transform)

2016-06-08 Thread Allistair
Google is going to be your best friend re. whether OpenCV has a SWT module, it didn't when I was looking back in 2014. I remember I also read a fair bit on SWT back when I was doing an Android OCR project. I eventually found a couple of projects (can't quite remember their names) in the open source

Re: [tesseract-ocr] Different behaviours with the same (part of) image.

2016-06-06 Thread Allistair
I do not have a technical reason for you but I confirm that Tesseract is sensitive to padding around words you are trying to detect (perhaps something about its page segmentation). Best to make sure text has enough white space around it in my experience. On 6 June 2016 at 18:23, 'Carlo' via tesser

Re: [tesseract-ocr] Re: Products Expiration date recognition

2016-06-03 Thread Allistair C
Everything Tom said and I would also stress that I have had a lot of trouble with text that borders noise - the grey carton may be easy enough to remove but the dark crease/join where the box closes and the proximity of text to it (and angle as tess breaks at angles > 10 drug in my tests) will c

Re: [tesseract-ocr] Re: Pre-processing for images with uneven illumination

2016-05-25 Thread Allistair
If you want to deal with uneven lighting then you should rather look at thresholding techniques. I had good success with an Android app using Tesseract and OpenCV and using a configuration of Otsu thresholding. More at http://docs.opencv.org/3.1.0/d7/d4d/tutorial_py_thresholding.html On 25 May 201

Re: [tesseract-ocr] Tesseract-OCR for Android Studio

2016-04-08 Thread Allistair C
You have not included the full stack grace so you have not shown the error you are getting, only the root call loading leptonica (did you include that lib?) try sending the full stack. Sent from my iPhone > On 7 Apr 2016, at 21:39, Can wrote: > > Hi everyone. I have to use tesseract-ocr for c

Re: [tesseract-ocr] "Empty Page" and incomplete text recognition

2015-10-27 Thread Allistair C
I think your whole document needs enough surrounding margin - I found the empty page issue when my text was too close to the page edges. In your first image you have this but not your second. Sent from my iPhone > On 26 Oct 2015, at 18:30, Daniel Kraft wrote: > > Hi all! > > I've just starte

Re: [tesseract-ocr] tesseract yields different results when image is rotated

2015-09-30 Thread Allistair C
Can you describe much better? What are your results looking like? What is the target text you are trying to recognise? > On 30 Sep 2015, at 16:27, George Tsai wrote: > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from t

Re: [tesseract-ocr] Re: Easiest way to run Tesseract from a Mac

2015-08-21 Thread Allistair
OK but you forget some users will prefer to use MacPorts. Instead of forcing Homebrew onto them, I see no harm in offering alternates to accommodate different tastes. On 21 August 2015 at 15:07, Nick White wrote: > On Fri, Aug 21, 2015 at 02:13:17PM +0100, Allistair wrote: > > This

Re: [tesseract-ocr] Using Tesseract to scan floor plans of a ship

2015-08-21 Thread Allistair
; > Thanks again for any tips in this challange. > > Rutger > > > > > > > On Fri, Aug 21, 2015 at 11:25 AM, Allistair C wrote: > >> The way I would do this is use a rectangle-by-color extraction phase that >> produces all the cropped out colour rectangle

Re: [tesseract-ocr] Re: Easiest way to run Tesseract from a Mac

2015-08-21 Thread Allistair
This, I think, just illustrates there is no one-size-fits-all approach. All methods should be enumerated for installing Tesseract for Mac. On 21 August 2015 at 13:04, Helmut Wollmersdorfer < helmut.wollmersdor...@gmail.com> wrote: > > > Am Donnerstag, 20. August 2015 14:44:49 UTC+2 schrieb Nick W

Re: [tesseract-ocr] Using Tesseract to scan floor plans of a ship

2015-08-21 Thread Allistair C
The way I would do this is use a rectangle-by-color extraction phase that produces all the cropped out colour rectangles with numbers and then perform ocr on each one which should be good success for the quality of text Sent from my iPhone > On 21 Aug 2015, at 08:45, Rutger Rozendal wrote: >

Re: [tesseract-ocr] Any suggestions with getting Tesseract to OCR this image?

2015-08-21 Thread Allistair C
Psm sorry - page segmentation mode Sent from my iPhone > On 21 Aug 2015, at 02:48, Amit Rao wrote: > > psr? > >> On Thursday, August 20, 2015 at 3:02:02 PM UTC-4, Allistair C wrote: >> Try different psr too - I got close with psr 6 >> >> Sent from my iPh

Re: [tesseract-ocr] Any suggestions with getting Tesseract to OCR this image?

2015-08-20 Thread Allistair C
> > > > I'll check out the threads on LCD/clock type reading. Thanks for the pointer. > > > > -amit > > >> On Thursday, August 20, 2015 at 8:09:58 AM UTC-4, Allistair C wrote: >> So another thing you could try ... I notice that everything i

Re: [tesseract-ocr] Easiest way to run Tesseract from a Mac

2015-08-20 Thread Allistair
It was a while ago, but I know it was painful enough to install an Ubuntu VM to compile Tesseract instead before discovering MacPorts ;) On 20 August 2015 at 16:45, Nick White wrote: > On Thu, Aug 20, 2015 at 03:46:32PM +0100, Allistair wrote: > > I had issues installed with Home

Re: [tesseract-ocr] Easiest way to run Tesseract from a Mac

2015-08-20 Thread Allistair
I had issues installed with Homebrew - it didn't install the dependencies very well like Leptonica etc. but could just have been an issue I was having. Conversely MacPorts worked out of the box. On 20 August 2015 at 13:44, Nick White wrote: > Hi all, > > I was looking at the Tesseract wiki, and

Re: [tesseract-ocr] Any suggestions with getting Tesseract to OCR this image?

2015-08-20 Thread Allistair
;s printed producing uncertain edges. Perhaps others can chip in. On 20 August 2015 at 10:31, Amit Rao wrote: > Thanks, Allistair. I was guessing that this font was similar to Lucida > Console. e.g. > > > https://www.google.com/search?q=lucida+console+font&espv=2&biw=1174&

Re: [tesseract-ocr] Any suggestions with getting Tesseract to OCR this image?

2015-08-20 Thread Allistair
Which Lucinda font do you think this is? All Lucinda fonts I see in a Google Image search are nothing like this. You're right, this does not OCR well. In fact, if you just crop out a part of it to remove other noise, say, 09:43 AM, even with lots of margin Tesseract isn't even finding anything it

Re: [tesseract-ocr] OCR with difficult circumstances, is it even possible?

2015-07-21 Thread Allistair
You will have much better success here by developing a routine that crops out the area of the meter containing the numbers before running it into OCR. If for instance all your meters look the same, you can guide the creation of the photo with for instance an app with an overlay of where to position

Re: [tesseract-ocr] Emoticons?

2015-05-23 Thread Allistair
When you train tesseract you provide it with loads and loads of text in the font/language of your choice. It then turns this into outlines effectively that it can match to incoming images with text. Now I am not 100% sure but I am quite certain that if you attempted to train Tesseract with a bunch

Re: [tesseract-ocr] Emoticons?

2015-05-23 Thread Allistair
OpenCV is a computer vision library. It has advanced features for computer vision in general. If you think about it, a frame of a video is just an image. So no, it does not need to be used for video, it can be used on static images also. http://docs.opencv.org/doc/tutorials/imgproc/histograms/te

Re: [tesseract-ocr] Emoticons?

2015-05-21 Thread Allistair C
Use opencv pattern matching Sent from my iPhone > On 22 May 2015, at 02:35, SRguy wrote: > > Might Tesseracts be trained to recognize emoticons, such as the new iPhone > ones? > Thanks. > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group.

Re: [tesseract-ocr] DO YOU HAVE ANY IDEA HOW TO IMPROVE MY OUTPUT??!!

2015-05-21 Thread Allistair
Is there any specific thing you are trying to get out of the result? Just everything? Because I notice that the source receipt itself is essentially not great in that many chars are invisible/rubbed out etc. there's only so much you can do with OCR - if you're source itself is pretty rubbish, and I

Re: [tesseract-ocr] Improve Text reading on image

2015-05-11 Thread Allistair
I guess I am not seeing the problem. Your pipeline is raw image -> image preprocess -> tesseract psm 6 -> postprocess to find likely numeric string Cheers On 11 May 2015 at 12:46, Nicholas Chew wrote: > Hi Allistair > Thanks for your reply. I had used Tesseract only. I need

Re: [tesseract-ocr] Improve Text reading on image

2015-05-11 Thread Allistair
OK - so what OCR have you tried so far? I got (psm 6) ... I-" ” " ’ I I“ 1""? 1' '. _ % % 92907120 W% A%'% % On 11 May 2015 at 09:12, Nicholas Chew wrote: > Hi > I need help to process this image to read the text. I had tried the > command below but Tesseract still cant read it. What else di

Re: [tesseract-ocr] Errors building Tesseract for Android on Mac

2015-05-06 Thread Allistair
my > Tesseract project. > > On Wednesday, 6 May 2015 12:37:50 UTC, Allistair C wrote: >> >> Here were my notes: >> >> Leptonica & Tesseract native librariesIn order to use Tesseract with >> Android we must use the work of the tess-two fork project of the >

Re: [tesseract-ocr] Errors building Tesseract for Android on Mac

2015-05-06 Thread Allistair
using a version older than v9? The current > version is 10. > > On Wednesday, 6 May 2015 12:23:55 UTC, Allistair C wrote: >> >> When I did this I followed a tutorial that said you have to use an older >> version of the NDK - are you using the latest? >> >> On 6

Re: [tesseract-ocr] Errors building Tesseract for Android on Mac

2015-05-06 Thread Allistair
When I did this I followed a tutorial that said you have to use an older version of the NDK - are you using the latest? On 6 May 2015 at 13:22, codefully wrote: > I have tried to build Tesseract for Android on a mac several times but > keep getting errors. I followed the steps below: > > curl -O

Re: [tesseract-ocr] Tips on how to improve results.

2015-05-02 Thread Allistair C
Try resampling your image up to 5x larger and try again. Sent from my iPhone > On 2 May 2015, at 00:01, Martín Ochoa <8amar...@gmail.com> wrote: > > Hi, > I'm developing an app that will have to read text from image in order to do > some things that have nothing to do with my question. So I hav

[tesseract-ocr] Re: Is there any way to speed up extraction using tesseract OCR Engine, while tiff file is having 600-700 pages?

2015-04-20 Thread Allistair C
What Tom said. However, let's assume all your variables are constant - resolution has to be just what you have, file format has to be TIF etc. then you can use a divide and conquer distributed computing pattern. That is, grab a machine that holds a queue of work and then make that queue farm ou

Re: [tesseract-ocr] OCR just a part of an image

2015-03-26 Thread Allistair C
Of course, it's up to you which image or part thereof you send to tesseract. You just need to use your vb image processing libraries to create a new image from a rectangular region of the source image. Sent from my iPhone > On 25 Mar 2015, at 22:07, Faissal Bouetire wrote: > > Hi every one >

Re: [tesseract-ocr] OCR accuracy and font specific

2015-02-17 Thread Allistair C
I think you must be pulling our leg. Either that or you are still mistakenly sending a jcpenney logo into OCR. Sent from my iPhone > On 17 Feb 2015, at 07:20, pgpur...@gmail.com wrote: > > Hi , > > I have tried to detect logo text from Kohl's logo attahced herewith, but it > returns JCPenney

Re: [tesseract-ocr] How can i improve OCR for attached image

2015-02-11 Thread Allistair
I would encourage you to read the last 15 posts in this group where a lot of input has been given to improving results - everything from higher resolution input to image preprocessing/cropping to different PSMs etc. FWIW your current image works best with PSM 6 (see below), but it's not perfect. I

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
Don't waste your time on splicing and rotating. Focus on a reliable scan setup for cropping. Tesseract already handles a degree of rotation correction, your issue is all the noise so focus on that. Sent from my iPhone > On 8 Feb 2015, at 19:19, Josh Wolcott wrote: > > I've seen some of Fred's

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair
to identify the blob some how. and I can not get opencv to >> download form any mirror... what the heck. This project keeps getting >> better. >> >> On Sunday, February 8, 2015 at 7:45:59 AM UTC-5, Allistair C wrote: >>> >>> If you butt them up

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
en the rest blank or total random. I have to > identify the blob some how. and I can not get opencv to download form any > mirror... what the heck. This project keeps getting better. > >> On Sunday, February 8, 2015 at 7:45:59 AM UTC-5, Allistair C wrote: >> If you butt th

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
up for times sake. > >> On Sunday, February 8, 2015 at 7:11:28 AM UTC-5, Allistair C wrote: >> Could you upload a scanned card at the resolution and angle that you tried >> without success? >> >> Sent from my iPhone >> >>> On 8 Feb 2015, at 12:05, Josh

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
I went > in to it assuming there would be an out of box solution to command line OCR. > The project was going swimmingly until I actually got to this. My patience > is beginning to wain =( > >> On Sunday, February 8, 2015 at 4:23:25 AM UTC-5, Allistair C wrote: >> I

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
I would personally use opencv rather than IM. It has more sophisticated routines to build on. http://stackoverflow.com/questions/16746473/opencv-find-bounding-box-of-largest-blob-in-binary-image Sent from my iPhone > On 8 Feb 2015, at 00:02, Josh Wolcott wrote: > > You know you really put it

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-07 Thread Allistair
The page segmentation mode (PSM) could help you - mode 6 is fairly good at finding various areas of text with images and other noise around but it will sometimes think the surrounding noise is text, so cropping is really the only solution here. Your problem is no different to automatic number plate

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-07 Thread Allistair C
One option is try a different PSM mode - 6 may work well. Or you have a card which is great because it means you have repeatable areas of text. Processing the card into cropped areas is possible if your scanning is controlled. Look at what http://card.io do to see an example of getting a good i

Re: [tesseract-ocr] How can i use tesseract-ocr in android studio?

2015-02-04 Thread Allistair
Use https://github.com/rmtheis/tess-two You'll need to compile Tesseract for Android then copy the so libraries into your Android Studio project per the normal way of using JNI libs. Cheers On 4 February 2015 at 10:56, Jorge Alamo wrote: > Hi > > How can i use tesseract-ocr in android stud

Re: [tesseract-ocr] how to pass OCR only in certain part of image (rect XY)?

2015-02-03 Thread Allistair
This is a preprocessing step that you would need to do with an image library, to crop your input image to just the area you want to send to OCR. If your input is a template and scanned reliably, i.e. orientation/size/resolution, then it should be relatively easy to crop the area. If not, then you w

Re: [tesseract-ocr] Help needed in understanding source. New to tesseract.

2015-01-31 Thread Allistair
If you start by learning C++ then you will realise the entry point to a C++ program (main), you can then trace the various calls either manually, methodically stepping through files and functions by searching. You could potentially look at tools that allow call stack debugging/tracing. I am not fam

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair C
on and > also my latest was using standard java's extension packages(using Imageio > etc) for binarization.I dont think I could have done it without googling :-). > The output eats away some of the normal text. > > > Here's the sample imaged result of binarization. >

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair C
At what point will you use Google to answer these simple questions? OpenCV has already been mentioned many times. Sent from my iPhone > On 22 Jan 2015, at 18:39, newbie wrote: > > Any idea of what free source is available for bininrizing in java ? > > Thanks > >> On Thursday, January 22, 201

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair
Not exactly an answer, but someone else with the same issue has gotten most of the way there. http://stackoverflow.com/questions/24385714/detect-text-region-in-image-using-opencv On 22 January 2015 at 15:35, newbie wrote: > ShreeDevi, > ImageMagick, seems like a manual tool, but I think the pro

Re: [tesseract-ocr] Re: Problems installing leptonica 1.69 with Tesseract 3.01 on Ubuntu 10.04 LTS

2015-01-20 Thread Allistair C
These are usually because libpng/libtiff Eric are not present, did you confirm the leptonica installed those dependencies? Sent from my iPhone > On 21 Jan 2015, at 05:56, Purohith Nayak wrote: > > Hi, > I installed leptonica then tesseract and everything went well, But when i > try to pa

Re: [tesseract-ocr] Help extracting text from images.

2015-01-14 Thread Allistair
es are in 96 dpi. Any solutions > ? > > On Tuesday, January 13, 2015 at 5:39:02 PM UTC-5, Allistair C wrote: >> >> I wasn't using a formula, just demonstrating that your original text was >> too small. Here is some advice from the FAQ (https://code.google.com/p/ >

Re: [tesseract-ocr] Awful results within a block text

2015-01-14 Thread Allistair C
See my reply to the other post yesterday on detail from the faq. It applies to you. Sent from my iPhone > On 13 Jan 2015, at 21:55, Vasin Soparkdithapong wrote: > > Hi Allistair, > > Would you happen to know what the recommended (ideal) image size is for > Tesseract 3.03

Re: [tesseract-ocr] Help extracting text from images.

2015-01-13 Thread Allistair
, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be "noise removed". On 13 January 2015 at 21:45, newbie wrote: > Allistair, > Sorry

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-13 Thread Allistair
Then I guess: a) you give up b) you employ my previous suggestion to use Virtual Box + Ubuntu to set yourself up a Linux box inside your Windows machine and use the tutorial I sent to train Tesseract Cheers On 13 January 2015 at 14:32, Albert Bonn wrote: > I don't understand this program, unf

Re: [tesseract-ocr] Awful results within a block text

2015-01-13 Thread Allistair
It's too small. Try resizing your image to 1000px wide and it works perfectly. Before resize: Thu ,5 2. m of 12 Liam! an to test the air (an: and 522 yr n works an a“ lying; of rm mm. the Quxzk brown dog name over mg my m. m muzk brown dog mmaen over 2»: my m. m muzk brawn daa mwen aver mg my m.

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-12 Thread Allistair
I'm not sure you can. The tools for training are made for Linux as far as I know. http://blog.cedric.ws/how-to-train-tesseract-301 However, once trained, you can use the training file in your Windows Tesseract. On 12 January 2015 at 16:07, Albert Bonn wrote: > How do I do it with windows? > >

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-12 Thread Allistair
Then use Linux (VirtualBox + Ubuntu for instance - all free) :) On 12 January 2015 at 15:50, Albert Bonn wrote: > Thanks. How do I do this? I only found tutorials for linux. > > That may mean you need to acquire this font and train Tesseract yourself >> explicitly. >> > -- > You received this m

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-12 Thread Allistair C
Sorry wrong clean image: https://www.dropbox.com/s/s7nzdqapr75yr23/clean.jpg?dl=0 On Monday, 12 January 2015 15:40:55 UTC, Allistair C wrote: > > Just to back that up some more ... > > Clean: did not work at all > > https://www.dropbox.com/s/jz4e8mm9onga9md/code.png?dl=0 &

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-12 Thread Allistair C
://www.dropbox.com/s/w2r2kp5is96oh2t/faked.jpg?dl=0 Cheers On Monday, 12 January 2015 15:34:12 UTC, Allistair C wrote: > > Even totally cleaned up of the surrounding frame and gradiented backdrop > on the screen, Tesseract does not recognise the large numbers for me. That > may mean you nee

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-12 Thread Allistair
Even totally cleaned up of the surrounding frame and gradiented backdrop on the screen, Tesseract does not recognise the large numbers for me. That may mean you need to acquire this font and train Tesseract yourself explicitly. Tesseract is trained with a whole bunch of fonts but probably not this

Re: [tesseract-ocr] Re: Different output on same text picture sometimes

2015-01-12 Thread Allistair
Doubt it, I tried once ;) On 12 January 2015 at 12:03, Gokcer Gunes wrote: > finaly i was able to change it but no luck even turning off:( ,thanks for > your time:) and if i send mail to raysmith about this problem, any chance > will he reply? > > 2015-01-12 12:25 GMT+

  1   2   >