Re: [tesseract-ocr] Help with blurred OCR but "simple text"

2017-04-06 Thread Allistair C
You might want to try preprocessing with a threshold filter (otsu threshold) to harden the edges? Sent from my iPhone > On 6 Apr 2017, at 10:16, Javier Abascal wrote: > > Hi everyone! :) > > I am having troubles identifying correctly the text in the images

Re: [tesseract-ocr] I can't get accurate ocr of this can anyone help with settings?

2017-01-02 Thread Allistair C
The whole point of a captcha is to evade automated reading. That's why letters are very close together and letters are heavily rotated off a consistent baseline. OCR is designed for normal text input so you need to do clever preprocessing here first. Sent from my iPhone > On 2 Jan 2017, at

Re: [tesseract-ocr] Loading user-words from code

2016-12-08 Thread Allistair C
Not sure it can but I wondered whether the scope of your legal regulation would allow: 1. Encrypt the user words file or store in your source code 2. In you wrapper program just before tesseract api init decrypt the file to tessdata 3. Init tesseract pointed to this file 4. Perform ocr 5.

Re: [tesseract-ocr] Tesseract cannot recognize clean webpage screenshot

2016-11-11 Thread Allistair C
eract to > recognize? > >> On Thursday, November 10, 2016 at 1:03:43 PM UTC-8, Allistair C wrote: >> What is it you are trying to achieve exactly? >> >>> On 10 November 2016 at 18:02, JF <jimfa...@gmail.com> wrote: >>> I'm using Tesseract (3.04.01 with lepton

Re: [tesseract-ocr] Help in read Blue and White image.

2016-08-19 Thread Allistair C
Do you have a sample image? Sent from my iPhone > On 19 Aug 2016, at 20:33, Lucas Alexandre wrote: > > >Hello, > > I am a new member of this mailing list. I am creating a small project to read > electronic screens through OCR. In other words, we set up some

Re: [tesseract-ocr] Is this the best I can get out of tesseract ?

2016-07-28 Thread Allistair C
Depends what part of the input image you are interested in? Sent from my iPhone > On 27 Jul 2016, at 16:28, Dorin Bujor wrote: > > > input.jpg > > > > > > > out.txt: > > > Ansamhhll River"s Towers- mnel.na:he@f|deliacasa.m - Fideliacasa Mail - > Goagle chrome >

Re: [tesseract-ocr] Need OCR SW designed to extract transactions from bank statements to xfer into a General Ledger like QuickBooks or a spreadsheet..

2016-07-20 Thread Allistair C
No idea what the best is but a google search lists a number of providers of such: Google for 'bank statement ocr' You should see results like statement reader and smartex for instance. Cheers Sent from my iPhone > On 20 Jul 2016, at 03:58, Dave Burleigh wrote: >

Re: [tesseract-ocr] Re: Help OCR'in an image

2016-07-14 Thread Allistair C
Have you tried resizing your image to be larger, try x2 larger - can sometimes help. Is this happening to all Ms or just one? Sent from my iPhone > On 14 Jul 2016, at 03:44, Raphael Budd wrote: > > So I added really strong pre processing that chops up the schedule,

Re: [tesseract-ocr] thresholding

2016-07-06 Thread Allistair C
Preprocessing with OpenCV before providing to Tesseract. Sent from my iPhone > On 6 Jul 2016, at 13:46, Mitesh Kalal wrote: > > I just started woking with tessaract. I am working on thresholding. How to > give input and get output image inn otsu thresholding method? >

Re: [tesseract-ocr] Need to understand Tesseract code

2016-06-16 Thread Allistair C
iyar > >> On Thursday, 16 June 2016 03:41:36 UTC+5:30, Allistair C wrote: >> Hi, >> >> Your question is a little difficult to understand - it sounds like you are >> saying on the one hand you have no OCR or image processing background, know >> Java, and want to modi

Re: [tesseract-ocr] Tesseract-OCR for Android Studio

2016-04-08 Thread Allistair C
You have not included the full stack grace so you have not shown the error you are getting, only the root call loading leptonica (did you include that lib?) try sending the full stack. Sent from my iPhone > On 7 Apr 2016, at 21:39, Can wrote: > > Hi everyone. I have

Re: [tesseract-ocr] "Empty Page" and incomplete text recognition

2015-10-27 Thread Allistair C
I think your whole document needs enough surrounding margin - I found the empty page issue when my text was too close to the page edges. In your first image you have this but not your second. Sent from my iPhone > On 26 Oct 2015, at 18:30, Daniel Kraft wrote: > > Hi all!

Re: [tesseract-ocr] tesseract yields different results when image is rotated

2015-09-30 Thread Allistair C
Can you describe much better? What are your results looking like? What is the target text you are trying to recognise? > On 30 Sep 2015, at 16:27, George Tsai wrote: > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Emoticons?

2015-05-22 Thread Allistair C
Use opencv pattern matching Sent from my iPhone On 22 May 2015, at 02:35, SRguy sanderatla...@gmail.com wrote: Might Tesseracts be trained to recognize emoticons, such as the new iPhone ones? Thanks. -- You received this message because you are subscribed to the Google Groups

Re: [tesseract-ocr] Tips on how to improve results.

2015-05-02 Thread Allistair C
Try resampling your image up to 5x larger and try again. Sent from my iPhone On 2 May 2015, at 00:01, Martín Ochoa 8amar...@gmail.com wrote: Hi, I'm developing an app that will have to read text from image in order to do some things that have nothing to do with my question. So I have that

[tesseract-ocr] Re: Is there any way to speed up extraction using tesseract OCR Engine, while tiff file is having 600-700 pages?

2015-04-20 Thread Allistair C
What Tom said. However, let's assume all your variables are constant - resolution has to be just what you have, file format has to be TIF etc. then you can use a divide and conquer distributed computing pattern. That is, grab a machine that holds a queue of work and then make that queue farm

Re: [tesseract-ocr] OCR just a part of an image

2015-03-26 Thread Allistair C
Of course, it's up to you which image or part thereof you send to tesseract. You just need to use your vb image processing libraries to create a new image from a rectangular region of the source image. Sent from my iPhone On 25 Mar 2015, at 22:07, Faissal Bouetire bouet...@gmail.com wrote:

Re: [tesseract-ocr] OCR accuracy and font specific

2015-02-17 Thread Allistair C
I think you must be pulling our leg. Either that or you are still mistakenly sending a jcpenney logo into OCR. Sent from my iPhone On 17 Feb 2015, at 07:20, pgpur...@gmail.com wrote: Hi , I have tried to detect logo text from Kohl's logo attahced herewith, but it returns JCPenney. Can

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
I would personally use opencv rather than IM. It has more sophisticated routines to build on. http://stackoverflow.com/questions/16746473/opencv-find-bounding-box-of-largest-blob-in-binary-image Sent from my iPhone On 8 Feb 2015, at 00:02, Josh Wolcott jswolc...@gmail.com wrote: You know

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
be an out of box solution to command line OCR. The project was going swimmingly until I actually got to this. My patience is beginning to wain =( On Sunday, February 8, 2015 at 4:23:25 AM UTC-5, Allistair C wrote: I would personally use opencv rather than IM. It has more sophisticated

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
, February 8, 2015 at 7:11:28 AM UTC-5, Allistair C wrote: Could you upload a scanned card at the resolution and angle that you tried without success? Sent from my iPhone On 8 Feb 2015, at 12:05, Josh Wolcott jswo...@gmail.com wrote: I will look in to opencv. Thank you. I spent many

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
blank or total random. I have to identify the blob some how. and I can not get opencv to download form any mirror... what the heck. This project keeps getting better. On Sunday, February 8, 2015 at 7:45:59 AM UTC-5, Allistair C wrote: If you butt them up against each other horizontally

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Allistair C
Don't waste your time on splicing and rotating. Focus on a reliable scan setup for cropping. Tesseract already handles a degree of rotation correction, your issue is all the noise so focus on that. Sent from my iPhone On 8 Feb 2015, at 19:19, Josh Wolcott jswolc...@gmail.com wrote: I've

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-07 Thread Allistair C
One option is try a different PSM mode - 6 may work well. Or you have a card which is great because it means you have repeatable areas of text. Processing the card into cropped areas is possible if your scanning is controlled. Look at what http://card.io do to see an example of getting a good

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair C
, Allistair C wrote: At what point will you use Google to answer these simple questions? OpenCV has already been mentioned many times. Sent from my iPhone On 22 Jan 2015, at 18:39, newbie spens.ma...@gmail.com wrote: Any idea of what free source is available for bininrizing in java

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-22 Thread Allistair C
At what point will you use Google to answer these simple questions? OpenCV has already been mentioned many times. Sent from my iPhone On 22 Jan 2015, at 18:39, newbie spens.mallang...@gmail.com wrote: Any idea of what free source is available for bininrizing in java ? Thanks On

Re: [tesseract-ocr] Re: Problems installing leptonica 1.69 with Tesseract 3.01 on Ubuntu 10.04 LTS

2015-01-20 Thread Allistair C
These are usually because libpng/libtiff Eric are not present, did you confirm the leptonica installed those dependencies? Sent from my iPhone On 21 Jan 2015, at 05:56, Purohith Nayak purohith...@gmail.com wrote: Hi, I installed leptonica then tesseract and everything went well, But

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-12 Thread Allistair C
://www.dropbox.com/s/w2r2kp5is96oh2t/faked.jpg?dl=0 Cheers On Monday, 12 January 2015 15:34:12 UTC, Allistair C wrote: Even totally cleaned up of the surrounding frame and gradiented backdrop on the screen, Tesseract does not recognise the large numbers for me. That may mean you need to acquire

Re: [tesseract-ocr] OCR of a heart rate monitor

2015-01-12 Thread Allistair C
Sorry wrong clean image: https://www.dropbox.com/s/s7nzdqapr75yr23/clean.jpg?dl=0 On Monday, 12 January 2015 15:40:55 UTC, Allistair C wrote: Just to back that up some more ... Clean: did not work at all https://www.dropbox.com/s/jz4e8mm9onga9md/code.png?dl=0 Clean with some paintwork

Re: [tesseract-ocr] Re: Different output on same text picture sometimes

2015-01-08 Thread Allistair C
asap 2015-01-08 1:24 GMT+02:00 Gokcer Gunes ggunes...@gmail.com: ah no its not noise there is no noise in original img it just result of crop in paint 2015-01-08 1:21 GMT+02:00 Allistair C allist...@gmail.com: Ah, I see - interesting :) The 2nd example isn't quite the same - it seems

Re: [tesseract-ocr] Re: Different output on same text picture sometimes

2015-01-07 Thread Allistair C
the issue is? On 7 January 2015 at 22:48, Gokcer Gunes ggun...@gmail.com javascript: wrote: i uploaded them as pictures 2015-01-08 0:29 GMT+02:00 Gokcer Gunes ggun...@gmail.com javascript: : yeah resul pictures are in message you cant see them? 2015-01-07 23:56 GMT+02:00 Allistair C

[tesseract-ocr] Re: Different output on same text picture sometimes

2015-01-07 Thread Allistair C
Your question is not self-evident, what are you trying to ask? Can you present your OCR results for each test you are conducting? On Monday, 5 January 2015 18:16:12 UTC, Gokcer Gunes wrote: https://lh3.googleusercontent.com/-QtKcTsT8fGY/VKrU2Bz4zZI/AD4/nCbru06vKac/s1600/testry2.png

[tesseract-ocr] Re: Recognizing ...

2015-01-07 Thread Allistair C
The ... is formally called an ellipsis and I can find nothing useful Googling except that somebody has tried using OpenCV object/feature detection to try and look for this. The only possible way I can imagine getting Tesseract to recognise an ellipsis is to train it where 3 full stops appear

[tesseract-ocr] Re: Automatic Number Plate Recognition

2015-01-07 Thread Allistair C
You've tried unicharambigs right (bottom of this page https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) On Thursday, 20 November 2014 12:53:43 UTC, Mark Beylis wrote: Hello I am making use of Tesseract OCR to perform number plate recognition on vehicles I am making use of

[tesseract-ocr] Re: Tesseract confidence level sample code for java

2014-11-19 Thread Allistair C
baseApi.init(filesDir.getPath() + /tesseract/, LANG); baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_BLOCK); baseApi.setImage(bmp); OCRResult result = new OCRResult(baseApi.getUTF8Text(), baseApi.meanConfidence()); baseApi.end(); Note OCRResult is my own object for holding values.

[tesseract-ocr] Re: Need Help with extracting info from Invoice

2014-11-18 Thread Allistair C
I wonder if there is anything consistent about the invoice design? For instance I notice that your invoice has Honda logos on the top left and top right essentially providing 2 anchors from which you could extrapolate resolution and location/orientation of the table of data. You could also

[tesseract-ocr] Re: Reading Device labels to get model number

2014-11-13 Thread Allistair C
I think the table lines are not helping. I up-sized your image to 1000px wide, then ran into Tesseract with PSM=6 and got mostly rubbish. Then I removed the table lines manually in Photoshop, then up-sized your image to 1000px wide, then ran into Tesseract with PSM=6: RFZBHMEDBSR R 134a/

[tesseract-ocr] Re: Reading Device labels to get model number

2014-11-13 Thread Allistair C
Do you have higher resolution images to work with - that's one issue going on here as the edges of your text are very fuzzy and at that resolution it's pretty hard for Tesseract. You can also play with Thresholding and Opening (Erosion/Dilation) to thicken some of your lines up (using e.g.