[tesseract-ocr] Phantom characters

2023-12-31 Thread Jason Shepherd
I'm using pytesseract and tesseract v5.3.3 to read some text from some images and I sometimes get these weird phantom characters. I've tried to do some image preprocessing like increasing the image size, erosion, thresholding, etc, but nothing seems to get rid of this random character that's s

Re: [tesseract-ocr] Re: OCR Solution

2022-07-30 Thread jason duchateau
*Notolytix Ltd.* > > > > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. &g

[tesseract-ocr] Re: Tesseract/OpenCV Project to Hire

2022-02-03 Thread Jason Reveleer
Please send questions/inquiries to me at jason.colbert @ reveleer.com Cheers, jason On Thursday, February 3, 2022 at 10:58:15 AM UTC-8 jason@gmail.com wrote: > We have a significant project underway that is leveraging > Tesseract/OpenCV/Python for an OCR/NLP project in the healt

[tesseract-ocr] Tesseract/OpenCV Project to Hire

2022-02-03 Thread Jason Colbert
Jason -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web vi

[tesseract-ocr] Re: FontAwesome and Tesseract

2019-06-17 Thread Jason
Can I "bump" this? Even if I only get a high-level description of the process? - How to make a box file (for v4) of unicode chars - How to make the training size invariant? Etc. Many thanks! On Tuesday, May 21, 2019 at 10:09:57 AM UTC-4, Jason wrote: > > I would like to b

[tesseract-ocr] Re: Tire DOT OCR - Black Text, Black Background

2019-05-21 Thread Jason
I would think that if you can use edge detection (sobel/laplacian) should be fine (DoG too), you would have the outline of the characters. Then it's a matter of detecting the closed paths and filling them. If you account for the radial (get it!?) nature of the text alignment, then bucket fill t

[tesseract-ocr] Re: Extract text from bright background color image(yellow)

2019-05-21 Thread Jason
Simple thresholding should work. You'll have to use something like OpenCV or your own routine if you are daring. If you convert to gray, anything less than 64 (out of 255) goes to black, and anything above that goes to white. On Tuesday, May 21, 2019 at 5:51:48 AM UTC-4, April Shar wrote: > >

[tesseract-ocr] FontAwesome and Tesseract

2019-05-21 Thread Jason
I would like to be able to detect shapes like those contained in FontAwesome. Take for example a gear: ( https://fontawesome.com/icons?d=gallery&q=gear) This is unicode character \uf013 I think this would be as simple as training a font, via http://trainyourtesseract.com/, but this did not work.

Re: [tesseract-ocr] How to extract text for processing by tesseract v4?

2019-05-20 Thread Jason Hihn
;> gimp do: Colors -> components -> decompose. >> >> 3. invert the image and try thresholding (OTSU, etc.) >> >> With a little programming you can identify and isolate black regions from >> white ones, but I do not know if this is something you want to do. >

[tesseract-ocr] How to extract text for processing by tesseract v4?

2019-05-07 Thread Jason
I have a problem with the current tesseract. I have documents that have sections of varying background and text colors. Ive read that tesseract v3 was white/black invariant and it didn't matter if I had white text on red background. But now it matters. The problem is, other parts in the same im

Re: [tesseract-ocr] Simple image FAIL fails

2019-04-30 Thread Jason
That's interesting because everything I've read about tesseract says that white/black or black/white (foreground/background) doesn't matter because it uses edge detection. (Outlines) https://research.google.com/pubs/archive/33418.pdf "by inspection of the nesting of outlines, and the number of

[tesseract-ocr] Conflicting TessBaseAPI::Init() documetation

2019-04-29 Thread Jason
I was reading the docs ( https://tesseract-ocr.github.io/4.0.0/a02186.html#a96899e8e5358d96752ab1cfc3bc09f3e ) and came across this apparent conflict and also noticed that the two paragraphs have overlapping content (i.e. datapath, language) *The datapath must be the name of the parent dire

[tesseract-ocr] Re: Simple image FAIL fails

2019-04-29 Thread Jason
Thank you for looking into this and confirming I am not crazy. On Monday, April 29, 2019 at 1:11:40 PM UTC-4, Jason wrote: > > Apologies for such a simple question but this is a super simple test case > and I don't understand why it isn't working. This simple image contains

[tesseract-ocr] Simple image FAIL fails

2019-04-29 Thread Jason
Apologies for such a simple question but this is a super simple test case and I don't understand why it isn't working. This simple image contains the words "PASS" and "FAIL". "PASS" is recognized but "FAIL" comes out as "wee". What can I do to get it to detect "FAIL" properly? I'm using the dem

[tesseract-ocr] How to convert Image into scgink?

2018-11-06 Thread Jason Rong
How to convert image into a scgink file so i can use it for SESHAT, Anybody have any idea about it ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract

[tesseract-ocr] Pytesseract cannot find the specified file

2015-12-11 Thread Jason Mellone
Has anyone experiences this ? I feel like I am quite close to cracking this but keep hitting road blocks. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group

[tesseract-ocr] Single Character recognition help

2015-03-17 Thread Jason Bush
Im trying to use tesseract on the Raspberry Pi to recognize single characters. I have luck with whole sheets but not with single characters and Im not very familiar with the programming. Im running 3.0.3 I need the most dumbed down version of how I can improve my single character recognition.

Re: Tesseract vs Commercial Products

2012-02-20 Thread Jason Funk
Actually, I am a developer. But I am new to the OCR world. The piece that I was missing in the equation is the image pre-processing. I will investigate it further. Thanks for your help. On Feb 19, 5:20 pm, Wil Hadden wrote: > Having recently used leptonica for pre-processing I have to ask why is

Re: Tesseract vs Commercial Products

2012-02-18 Thread Jason Funk
February 18, 2012, Sven Pedersen wrote: > > Commercial options have lots of built-in age processing. You can do that > > with free software but it does not just do it automatically. Post examples > > and you'll get feedback about how to do it with tesseract. > >

Re: Tesseract vs Commercial Products

2012-02-18 Thread Jason Funk
> > > > > > > > > Tesseract is especially good for custom training for a particular type of > > text. Accuracy can increase to over 98% for a given font. Also, it can be > > trained for foreign languages. > > --Sven > > > On Sat, Feb 18, 2012 at 1:4

Re: Tesseract vs Commercial Products

2012-02-18 Thread Jason Funk
If I am understanding you right, it does not work very well without being trained? Jason On Feb 18, 3:07 pm, Sven Pedersen wrote: > Tesseract is especially good for custom training for a particular type of > text. Accuracy can increase to over 98% for a given font. Also, it can be > tr

Tesseract vs Commercial Products

2012-02-18 Thread Jason Funk
I am testing tesseract against some other commercial products and the commercials products seems to blow tesseract out of the water in terms of quality and accuracy. Is this because tesseract is just not as good as the other products? Or perhaps tesseract is designed for a specific purpose other th

min x height configuration

2012-02-02 Thread Jason Funk
I'm trying to configure tesseract to use an larger min x height but it doesn't seem to be working. I added a line to my tess-config that says "min_sane_x_ht_pixels 36" which is bigger than most of the characters and it doesn't seem to have any effect. Is this not a valid configuration value? Is it

Re: English Word Filtering

2011-11-15 Thread Jason Funk
be helpful! Thanks, Jason On Tue, Nov 15, 2011 at 7:14 AM, patrickq wrote: > Anything is possible with Tesseract since there are gazillion settings > but in my opinion a setting that returns only words in the dictionary > would be useless to 99.9% of application usages. For one thing, si

English Word Filtering

2011-11-15 Thread Jason Funk
Does Tesseract make any attempts to filter out things that aren't words? For example, I processed an image and it returned this: "This is a slide about a muffin's magical powers. !%i Muffin Power HI K Q55 iii‘ E!!! iU_ ‘gm !" All of the words that it found are right, but everything else isn't. I d

Senior Business Intelligence Architect / Technical Developer

2011-06-17 Thread Jason Smith
Hello, Hope you are doing well today. This is Jason Smith from Panzer Solutions looking for Senior Business Intelligence Architect / Technical Developer , Please let me know if you would like to move forward with the below position. For quick reponse, send me your consultant resumes on my mail

C#/ASP.NET Developer || Jersey City, NJ || 6 + months

2011-06-08 Thread Jason Smith
Hello, Hope you are doing well today. This is Jason Smith from Panzer Solutions looking for C# and ASP.NETDeveloper, Please let me know if you would like to move forward with the below position. For quick response please send me your resumes on my mail id jsm...@panzersolutions.com *Job Title

Data Warehouse Tech Lead || Bridgewater, NJ || 8 + months

2011-06-08 Thread Jason Smith
Hello, Hope you are doing well today. This is Jason Smith from Panzer Solutions looking for Data Warehouse Tech Lead, Please let me know if you have any consultant available for the below position. For quick response, please send me your resume on my id: jms...@panzersolutions.com *Job Title

Java Front End Developer || Palo Alto,CA || 1 month

2011-06-07 Thread Jason Smith
Hello, Hope you are doing well today. This is Jason Smith from Panzer Solutions looking for Front-End Developer, Please let me know if you would like to move forward with the below position. For quick response, please send me your resume on my id: jsm...@panzersolutions.com * Job Title : Front