Re: [tesseract-ocr] Small Programming Job

2018-05-04 Thread Dmitri Silaev
a solution around it. It's reliable and well tailored for a class of problems such as yours. Now it's being used as a workhorse by few our clients. Look at www.CustomOCR.com. ScreenMine is not announced there yet, but there you can find a contact form to reach us. Regards, Dmitri Silaev On Fri, May 4

Re: [tesseract-ocr] Re: How to prevent tesseract from rotating input image?

2017-10-23 Thread Dmitri Silaev
hem about it here: https://github.com/ > DanBloomberg/leptonica/issues/251 > > Thank you so much! > > On Monday, October 23, 2017 at 11:58:22 AM UTC-4, Dmitri Silaev wrote: >> >> Image handling in Tesseract is done with Leptonica. I have little >> knowledg

Re: [tesseract-ocr] Re: How to prevent tesseract from rotating input image?

2017-10-23 Thread Dmitri Silaev
r 21, 2017 at 5:12:01 PM UTC-4, Dmitri Silaev wrote: >> >> Without delving deeper, I can suggest that you probably need to >> investigate your image EXIF orientation value. Most image handling >> libraries respect it. I suppose your image viewer also supports this >> pa

Re: [tesseract-ocr] Detection on complex images

2017-10-19 Thread Dmitri Silaev
m any kind of forum is just an > ordinary work (think of stackoverflow, is it unfair to find an idea or a > piece of code from there ?). Developing a full solution it's a different > thing and it is what I will try to do. > > thanks for your time. > > On Wednesday, October 18,

Re: [tesseract-ocr] Detection on complex images

2017-10-18 Thread Dmitri Silaev
Wow, we are being taken advantage of. Smart move Paolo but not fair. Heck, I almost started writing the answer. On Tue, Oct 17, 2017 at 7:00 PM, Tom Morris wrote: > I don't suppose this has anything to do with the Top Coder Mud Logger OCR > contest, does it? >

Re: [tesseract-ocr] My simple jpg image with text isn't recognized well. Newbie here.

2017-10-16 Thread Dmitri Silaev
Великий <alex.velic...@gmail.com> wrote: > Thank you, Dmitri. > > Is there a way to optimize speed of recognition? Like disabling OCR to > seek for some patterns? > > On Wednesday, October 11, 2017 at 7:15:01 PM UTC+3, Dmitri Silaev wrote: >> >> See my previous answe

Re: [tesseract-ocr] Detection on complex images

2017-10-16 Thread Dmitri Silaev
r each word, I would > already solved the problem. > > > On Saturday, October 14, 2017 at 10:29:29 PM UTC+2, Dmitri Silaev wrote: >> >> What are you unhappy with: detection rate or recognition accuracy? All in >> all, there's a ton of reasons why Tess can work poorly he

Re: [tesseract-ocr] Re: How to prevent tesseract from rotating input image?

2017-10-15 Thread Dmitri Silaev
OK, I think, it's a bunch of your fallacies. But let's start from the beginning. Send the exact image you are passing to Tess, version of Tess, your config file, and the command line. -Dmitri On Sun, Oct 15, 2017 at 11:42 AM, Dan9er wrote: > Bump > > On Friday,

Re: [tesseract-ocr] Detection on complex images

2017-10-14 Thread Dmitri Silaev
are going to look for, - their bounding boxes within your sample image. Once I have it, I might be able to help. Best regards, Dmitri Silaev www.CustomOCR.com On Fri, Oct 13, 2017 at 9:05 AM, Paolo Giannoccaro <pa.giannocc...@gmail.com > wrote: > Hi, > I need to detect a fixed

Re: [tesseract-ocr] My simple jpg image with text isn't recognized well. Newbie here.

2017-10-11 Thread Dmitri Silaev
, Александр Великий <alex.velic...@gmail.com> wrote: > Should simply converting image to grayscale do the trick in such cases > (bright colored background) or something else may be needed? > > On Wednesday, October 11, 2017 at 12:37:12 AM UTC+3, Dmitri Silaev wrote: >>

Re: [tesseract-ocr] My simple jpg image with text isn't recognized well. Newbie here.

2017-10-11 Thread Dmitri Silaev
t; Thank you very much. > > Indeed, the images that you provided were successfully parsed by > Tesseract. > Could you suggest tools that could programmatically change colors of > image to get one like yours? > > On Wednesday, October 11, 2017 at 12:37:12 AM UTC+3, Dmitri Si

Re: [tesseract-ocr] My simple jpg image with text isn't recognized well. Newbie here.

2017-10-10 Thread Dmitri Silaev
al bar (a cursor?) at the end of this word. Get rid of the bar, and the results will be just perfect. See "src_sat-100_nobar.jpg". Perhaps, you can make use of ImageMagic's morphology to remove thin bars and the like. Best regards, Dmitri Silaev www.CustomOCR.com On Tue, Oct 10, 2

Re: [tesseract-ocr] Extract text from simple image

2017-10-01 Thread Dmitri Silaev
na ExtraIOMV Diesel 2017-09-3019:38 > 4.930 RON 5.220 RON > > > 2) > convert in.png -threshold 38.038% out.png > > Dataffimp Motorina Standard Motorina Extra/OMV Diesel 2017-09-3019:38 > 4.930 RON 5.220 RON > > > For threshold >= 38,039 'Data/Timp' is correc

Re: [tesseract-ocr] Reading # from image only ~75% successful

2017-09-29 Thread Dmitri Silaev
d be happy to help. Best regards, Dmitri Silaev www.CustomOCR.com On Fri, Sep 29, 2017 at 12:12 PM, Ben Schipper <schip...@londonhydro.com> wrote: > I am attempting to read a fairly large 6 digit number from an image using > Tesseract 3.02 on a windows 7 machine. > > I have been

Re: [tesseract-ocr] Why is Tesseract unable to recognize the digits in these two images?

2015-11-19 Thread Dmitri Silaev
s often may confuse foreground and background pixels - usually foreground is black. Example command line: tesseract debug_i.png debug_i.png -psm 7 Tested with Tess executable built as of 20150203. Best regards, Dmitri Silaev www.CustomOCR.com On Thu, Nov 19, 2015 at 8:04 PM, Sean Leffler <s..

Re: [tesseract-ocr] Re: Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-10-05 Thread Dmitri Silaev
Shishir, Do not hijack this thread. Go create a separate one with your own question. -Dmitri On Sat, Oct 3, 2015 at 10:19 AM, Shishir Singhal wrote: > sir i am doing a project based on hand written character recognition based > on google tesseract but i the

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-10-02 Thread Dmitri Silaev
ymbols you are after. The rest is trivial - count tactile symbols and get the denomination of your bill. Of course, you'd add more sophistication to cope with real world images but the backbone of the algorithm looks to me like this. All work is done in grayscale. HTH Best regards, Dmi

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-09-26 Thread Dmitri Silaev
The said preprocessing would be needed anyway even if Tesseract worked for your "characters". Tell what you already have done so far in this direction so I can share more details about the above method, if you wish. -Dmitri Hi Dmitri Silaev. Thanks for reply. They are bills, sorry for mis

Re: [tesseract-ocr] Train tesseract 3.04 for recognition of six patterns no existents in UTF-8

2015-09-23 Thread Dmitri Silaev
Hi Juan Pablo, The problem seems interesting. However not sure if you can use Tesseract for that. Could you show one or more example tickets? Best regards, Dmitri Silaev www.CustomOCR.com On Tue, Sep 22, 2015 at 2:17 AM, Juan Pablo Aveggio <jpaveg...@gmail.com> wrote: > Hello >

Re: [tesseract-ocr] Extract Graphics from Video and get text with OCR

2015-09-23 Thread Dmitri Silaev
I know it's tempting to use Tesseract as a free off-the-shelf tool but it comes at a cost of less accuracy. What I suggested gives an accuracy close to 100%. The choice is yours. Best regards, Dmitri Silaev www.CustomOCR.com On Mon, Sep 21, 2015 at 10:26 PM, Keith Reilly <krei...@retroreport

Re: [tesseract-ocr] OCR with difficult circumstances, is it even possible?

2015-07-21 Thread Dmitri Silaev
at the bottom then it's usually alright. And finally, I suppose Tesseract already has a pretty decent collection of trained fonts to work with most meter types. Regards, Dmitri Silaev www.CustomOCR.com On Tue, Jul 21, 2015 at 9:40 AM, Marc Bruins marciebru...@gmail.com wrote: Hello all, I

Re: [tesseract-ocr] how to train tesseract?

2015-06-30 Thread Dmitri Silaev
23:39:33 UTC+3 tarihinde Dmitri Silaev yazdı: As the first mandatory step you need to do perspective correction, e.g. using paper sheet boundaries (is it a lottery ticket?) Then depending on how it goes further with Tesseract you may need either to: - Train for this particular font - Blur

Re: [tesseract-ocr] how to train tesseract?

2015-06-29 Thread Dmitri Silaev
vertically by a factor of 1.5 to match closer to standard trained fonts Each step in turn is a multi-step process. PM me if you're interested. Best regards, Dmitri Silaev www.CustomOCR.com On Mon, Jun 29, 2015 at 10:21 PM, Cenk KIZILDAĞ kizildagc...@gmail.com wrote: Hi, I would like

Re: [tesseract-ocr] Is Tesseract capable of extreme accuracy on cards of different formats?

2015-06-01 Thread Dmitri Silaev
- Cattoni, Coianiz - Document Structure Analysis Algorithms - 2003 - Mao, Rosenfeld, Kanungo Best regards, Dmitri Silaev www.CustomOCR.com On Mon, Jun 1, 2015 at 7:43 PM, S Kirkwood smkirkwood4...@gmail.com wrote: Thank you for the response Dmitri. It is reassuring to know that this can

Re: [tesseract-ocr] Is Tesseract capable of extreme accuracy on cards of different formats?

2015-05-30 Thread Dmitri Silaev
. Be inventive. Decent accuracy can be achieved. You should admit, though, a less than 100% accuracy rate. Best regards, Dmitri Silaev www.CustomOCR.com On Fri, May 29, 2015 at 10:57 PM, S Kirkwood smkirkwood4...@gmail.com wrote: Hi, I am working on a project that requires OCR. I have not used

Re: [tesseract-ocr] Not Getting Proper Output using Tesseract

2015-05-28 Thread Dmitri Silaev
You won't get any improvement just by changing a few params. A more complex processing is required. Let me know if you're interested in more details. Best regards, Dmitri Silaev www.CustomOCR.com On Thu, May 28, 2015 at 8:50 AM, supriya Das supriya.i...@gmail.com wrote: Hello Everybody

Re: [tesseract-ocr] Improve recognition with multiple font sizes

2015-05-28 Thread Dmitri Silaev
Such params are not known to me. But if they were I'm pretty sure that would be a quite unreliable solution. In my opinion just stick with the solution you found yourself - split into fragments. Best regards, Dmitri Silaev www.CustomOCR.com On Wed, May 27, 2015 at 6:00 PM, Brad brad.s

Re: [tesseract-ocr] Not Getting Proper Output using Tesseract

2015-05-28 Thread Dmitri Silaev
by programming but might be done by means of ImageMagick/shell scripts also. Best regards, Dmitri Silaev www.CustomOCR.com On Thu, May 28, 2015 at 2:47 PM, supriya Das supriya.i...@gmail.com wrote: Hello Dmitri Siaev, Thanks for your response. Please tell me the complex processing logic. Thanks

Re: [tesseract-ocr] poor results on relatively straightforward image

2015-05-21 Thread Dmitri Silaev
Show the source image. Show what you have done to get the binarized version. Best regards, Dmitri Silaev www.CustomOCR.com On Thu, May 21, 2015 at 1:55 AM, hj hsje...@gmail.com wrote: see attached image. have tried various things, including this config: tessedit_char_whitelist 0123456789

Re: [tesseract-ocr] OCR failing on simple and clear text codes

2015-05-20 Thread Dmitri Silaev
source image and run Tess in the single char PSM. I think it's should be easy as long as location of every character is quite stable among your source images. ImageMagick/shell scripts would suffice. Best regards, Dmitri Silaev www.CustomOCR.com On Wed, May 20, 2015 at 12:52 PM, Yoann Nicod th3

Re: [tesseract-ocr] OCR failing on simple and clear text codes

2015-05-20 Thread Dmitri Silaev
20, 2015 at 12:29:08 PM UTC+2, Dmitri Silaev wrote: One no-brainer method to try out would be turning off all dictionaries and using your own custom user-patterns file. Since you said about your application I suppose you can program. So you can take a look at the comment preceding

Re: [tesseract-ocr] ocr thermostat numbers

2015-05-18 Thread Dmitri Silaev
on the internet - look for them. They seem to address fonts similar to yours, but in the end you'd probably need to train yourself. Best regards, Dmitri Silaev www.CustomOCR.com On Thu, May 14, 2015 at 8:17 PM, James Okken jokke...@gmail.com wrote: Dmitri, thanks very much for your response. any

Re: [tesseract-ocr] ocr thermostat numbers

2015-05-14 Thread Dmitri Silaev
for you, though; it depends on source image specifics. Attach several samples. Best regards, Dmitri Silaev www.CustomOCR.com On Wed, May 13, 2015 at 8:31 PM, James Okken jokke...@gmail.com wrote: hi everyone. can tesseract pull the numbers off this thermostat picture attached? I've tried

Re: [tesseract-ocr] Re: Tesseract With Opencl

2015-05-13 Thread Dmitri Silaev
Great contribution! Thanks! -Dmitri On Wed, May 13, 2015 at 4:41 PM, Ryan Baumann rfbaum...@gmail.com wrote: I wrote up my experiments with OpenCL-enabled Tesseract here: http://ryanfb.github.io/etc/2015/03/18/experimenting_with_opencl_for_tesseract.html On Friday, May 8, 2015 at 3:58:42

Re: [tesseract-ocr] Tips on how to improve results.

2015-05-06 Thread Dmitri Silaev
result. tesseract inet009_rs_cr_ts.jpg inet009_rs_cr_ts.jpg -l fra (inet009_rs_cr_ts.jpg.txt) The lower word just being cropped out leads to normal recognition. Best regards, Dmitri Silaev www.CustomOCR.com On Sat, May 2, 2015 at 2:01 AM, Martín Ochoa 8amar...@gmail.com wrote: Hi, I'm

Re: [tesseract-ocr] Re: ocr on subtitles

2015-05-04 Thread Dmitri Silaev
) - Run Tess - perfect tesseract.exe inet010_ntransp_ts.png inet010_ntransp_ts.png (inet010_ntransp_ts.png.txt) Best regards, Dmitri Silaev www.CustomOCR.com On Mon, May 4, 2015 at 1:29 PM, franck dev dev.franck...@gmail.com wrote: Hi, I tried with imagemagick: -colorspace Gray -negate

Re: [tesseract-ocr] ocr on subtitles

2015-05-03 Thread Dmitri Silaev
. If your other subtitle images have similar structure this method should work regardless of char color. Best regards, Dmitri Silaev www.CustomOCR.com On Sun, May 3, 2015 at 10:31 PM, franck dev dev.franck...@gmail.com wrote: Hi, I have tried to do ocr on subtitles picture but depending

Re: [tesseract-ocr] Extracting molecular labels from biological pathway images

2015-04-29 Thread Dmitri Silaev
show how if you're interested. For some clues on that see my post in this thread: https://groups.google.com/forum/#!msg/tesseract-ocr/STHaLGYsiCo/pCT2kxMgwI8J Best regards, Dmitri Silaev www.CustomOCR.com On Mon, Apr 27, 2015 at 9:34 PM, Alexander Pico xanderp...@gmail.com wrote: I am trying

Re: [tesseract-ocr] OCR on Nintendo game screenshots

2015-04-24 Thread Dmitri Silaev
- go ahead. No math or other specific knowledge required. Best regards, Dmitri Silaev www.CustomOCR.com On Fri, Apr 24, 2015 at 1:00 AM, Leah Siddall leah.sidd...@elementaltechnologies.com wrote: *mind blown* this is a much better approach!! especially how quickly i found something like

Re: [tesseract-ocr] OCR on Nintendo game screenshots

2015-04-23 Thread Dmitri Silaev
the point. You'd better invest your time into accumulating a collection of score digit coordinates in each game, than into a struggle with quirky OCR results. Well, unless you're eager to. Best regards, Dmitri Silaev www.CustomOCR.com On Thu, Apr 23, 2015 at 10:51 PM, Leah Siddall leah.sidd

Re: [tesseract-ocr] OCR on Nintendo game screenshots

2015-04-23 Thread Dmitri Silaev
, Dmitri Silaev www.CustomOCR.com On Thu, Apr 23, 2015 at 9:05 AM, Leah Siddall leah.sidd...@elementaltechnologies.com wrote: Hi all! I am not having luck with tesseract and the fonts used in NES games like Super Mario Bros. 3. ( i've attached an example screenshot ). My goal is scrape

Re: [tesseract-ocr] Way to set minimum font size to reduce errors

2015-04-21 Thread Dmitri Silaev
regards, Dmitri Silaev www.CustomOCR.com On Tue, Apr 21, 2015 at 6:17 AM, John James ashoutforh...@gmail.com wrote: Hi All I am looking for a parameter that sets the minimum acceptable rectangle size that tesseract will interpret as a character. For example every character in the image has

Re: [tesseract-ocr] Re: Cast word confidence success rate ?

2015-04-08 Thread Dmitri Silaev
It seems you're confusing certainty and confidence here. Please pay close attention to what you're writing or rephrase your question. The formula itself allows no values out of the [0, 100] range. Best regards, Dmitri Silaev www.CustomOCR.com On Wed, Apr 8, 2015 at 8:37 AM, Gunasekaran Velu

Re: [tesseract-ocr] Problem with recognition of numbers 3 and 8

2015-02-24 Thread Dmitri Silaev
used FastStone Image Viewer's Blur with a parameter of 14. If you want to use ImageMagick - I don't know how it exactly relates to Gaussian blur sigma, you have to experiment. Then a standard command line for Tesseract works well. At least no more 8 vs. 3 errors. Best regards, Dmitri Silaev

Re: [tesseract-ocr] Re: Help compiling ScrollView.jar

2015-02-23 Thread Dmitri Silaev
regards, Dmitri Silaev www.CustomOCR.com On Mon, Feb 23, 2015 at 5:35 PM, James Owers james.f.ow...@gmail.com wrote: Have cross posted this to StackOverflow: http://stackoverflow.com/questions/28676158/compiling-tesseract-debugger-to-visualise-region-classification On Wednesday, 18 February

Re: [tesseract-ocr] OCR not accuracy

2015-02-13 Thread Dmitri Silaev
Do this: - Use higher resolution. You can get much better results upscaling 3x - Use better image quality and format (lossless TIFF, PNG) - Get rid of the vertical text at the left Best regards, Dmitri Silaev www.CustomOCR.com On Sat, Feb 14, 2015 at 8:29 AM, Gunasekaran Velu mail2vg

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Dmitri Silaev
with color filtering, line detection and other steps which can increase accuracy. Best regards, Dmitri Silaev www.CustomOCR.com On Sun, Feb 8, 2015 at 12:23 PM, Allistair C allist...@gmail.com wrote: I would personally use opencv rather than IM. It has more sophisticated routines to build

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Dmitri Silaev
Excuses, that should be *drafting tape* On Sun, Feb 8, 2015 at 8:59 PM, Dmitri Silaev daemons2...@gmail.com wrote: Well, the computer approach still has a lot of potential, hehe )) Check this: http://www.fmwconcepts.com/imagemagick/unrotate/index.php By using this script, you can drop your

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Dmitri Silaev
own pitfalls. At least you can give it a try. Best regards, Dmitri Silaev www.CustomOCR.com On Sun, Feb 8, 2015 at 6:47 PM, Josh Wolcott jswolc...@gmail.com wrote: I agree. That seems like a very workable solution long term. I will work on cropping more carefully and look in to a tray

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Dmitri Silaev
Wow, a negative tray printed by a 3D printer! Cool idea, I like it! Should make all things simple. Best regards, Dmitri Silaev www.CustomOCR.com On Sun, Feb 8, 2015 at 5:43 PM, Allistair allist...@gmail.com wrote: I agree, this cannot be too difficult to scan them in a repeatable, oriented

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Dmitri Silaev
result. And show us your fixed cropping results. I suppose those should be 3 images per card - the two one liners and the long description. Best regards, Dmitri Silaev www.CustomOCR.com On Sun, Feb 8, 2015 at 3:48 PM, Josh Wolcott jswolc...@gmail.com wrote: Trust me I tried. Seemed like a simple

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-08 Thread Dmitri Silaev
, leave others as is. Through the Cygwin terminal, the script runs like a charm. Best regards, Dmitri Silaev www.CustomOCR.com On Sun, Feb 8, 2015 at 10:19 PM, Josh Wolcott jswolc...@gmail.com wrote: I've seen some of Fred's stuff and he does some impressive work. However, I have to run

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-07 Thread Dmitri Silaev
with ImageMagick, feed them to Tesseract one by one et voila! The text is clear enough to be processed by Tesseract without any further preprocessing. OneNote just has a better text detection routine, so that it gets less confused by graphics. Best regards, Dmitri Silaev www.CustomOCR.com On Sat

Re: [tesseract-ocr] noob - no output black text on white background surrounded by color backgrounds borders and images

2015-02-07 Thread Dmitri Silaev
to place the cards evenly? Best regards, Dmitri Silaev www.CustomOCR.com On Sat, Feb 7, 2015 at 10:57 PM, Josh Wolcott jswolc...@gmail.com wrote: My issue with cropping is that due to the variances in where the images are I end up with a large variance in the images. I'll attach two examples

Re: [tesseract-ocr] Preprocessing advice for digits on colored background

2015-02-05 Thread Dmitri Silaev
Works perfectly out of the box with the latest repository version, even without digit (i.e. whitelist). What version do you use? Best regards, Dmitri Silaev www.CustomOCR.com On Mon, Feb 2, 2015 at 9:01 PM, Simon Hill simonhill...@gmail.com wrote: Sorry if this has been asked before

Re: [tesseract-ocr] Re: Trying to identify engraving captions for orientation -- new to Tesseract, could use some help

2015-02-05 Thread Dmitri Silaev
using another OCR engine allowing to separate and tune the text detection stage. One or more versions of Abbyy software can do this. Best regards, Dmitri Silaev www.CustomOCR.com On Fri, Jan 9, 2015 at 11:44 PM, J. Heald j.he...@ucl.ac.uk wrote: Sorry if last post was TL;DR But the basic

Re: Tesseract 3.0 Performance down at 32bit Os

2013-11-06 Thread Dmitri Silaev
on. For some types of images probably it can work. Are you sure you can't remember anything? Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Nov 6, 2013 at 7:31 PM, Andreas Romeyke art1pi...@googlemail.com wrote: Hello Dmitri, Am Donnerstag, 31. Oktober 2013 09:01:44 UTC+1 schrieb Dmitri

Re: Tesseract 3.0 Performance down at 32bit Os

2013-10-31 Thread Dmitri Silaev
environment. To configure your system's hardware, you'll need a clean machine (or many of diverse types) and quite a few experiments to understand CPU and memory consumption for your types of images. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Oct 31, 2013 at 10:36 AM, Niral Prajapati

Re: Cube documentation, training source files, and openness

2013-05-30 Thread Dmitri Silaev
Excellent post, Nick! The more I read, the more I felt I had to ask these questions myself, but didn't yet. I'm afraid, though, many of them would remain unanswered. Because after several years of monitoring and asking in this forum I got used to the feeling that principal developers make only

Re: OCR for MRZ

2013-04-04 Thread Dmitri Silaev
Nick, In image processing ROI usually means region of interest Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Apr 3, 2013 at 11:23 PM, Nick White nick.wh...@durham.ac.uk wrote: Hi Dmitri, Can you explain what ROI is in this context please? I'm not familiar with the term. Thanks

Re: OCR for MRZ

2013-04-03 Thread Dmitri Silaev
trying to tweak diverse parameters with no stable effect, making progress with some images and failing with others. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Apr 3, 2013 at 12:30 AM, Art Solano amscloudn...@gmail.com wrote: We are looking to use Tesseract for processing travel document

Re: Why tesseract can't recognize this?

2013-02-20 Thread Dmitri Silaev
Use page segmentation mode 5, 6 or 7 (the -psm command line switch). Tesseract's automatic layout analysis fails for this image so you have to specify the layout manually. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Feb 20, 2013 at 6:42 PM, Andrea Fontana trik...@gmail.com wrote

Re: Bounding Boxes (Text Regions)

2013-02-19 Thread Dmitri Silaev
You cannot do this with the stock Tesseract. A specifically designed image processing pipeline needs to be implemented to extract text for subsequent recognition by Tesseract. Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Feb 19, 2013 at 12:05 PM, Romeo Jihara rjih...@gmail.com wrote

Re: Should TR errors be ignored for a large text sample on a pair of TIF/BOX? What is the best practice here?

2013-02-19 Thread Dmitri Silaev
to 10% error rate. Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Feb 19, 2013 at 10:19 PM, Carlos Antunes cf.antu...@gmail.comwrote: Hello all, While generating the TR for a TIF/BOX pair using a large text, there are some errors when the box cannot be made and hence some

Re: how to get the progress during OCR

2013-01-12 Thread Dmitri Silaev
, Dmitri Silaev www.CustomOCR.com On Fri, Jan 11, 2013 at 11:44 AM, wowgreat...@gmail.com wrote: sometime it takes a long time to run OCR so how to get the progress during OCR? thanks ! I'm using Tesseract3.02win7 b4bit VS2010 -- You received this message because you

Re: Bank Card Embossing Characters Recongnition

2012-12-30 Thread Dmitri Silaev
input for Tesseract. Otherwise damaged versions of same characters would differ much so you'd need to train Tess for every such version. This in turn would certainly lead to an accuracy drop and you'd waste much time struggling with all kinds of OCR issues. Warm regards, Dmitri Silaev

Re: Bank Card Embossing Characters Recongnition

2012-12-24 Thread Dmitri Silaev
to avoid any quirky configurations (shadows, extreme flares, etc.) Warm regards, Dmitri Silaev www.CustomOCR.com On Mon, Dec 17, 2012 at 8:02 AM, Neo Song neo.f...@gmail.com wrote: Dear Dmitri, There is one thing that confuses me heavily. For a Coaxial light source, I can get solid stroke

Re: Bank Card Embossing Characters Recongnition

2012-12-13 Thread Dmitri Silaev
connected. But be prepared that no perfect character contours can be obtained, like with any other edge detection procedure. HTH and good luck! Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Dec 13, 2012 at 1:35 PM, Neo Song neo.f...@gmail.com wrote: Hi gadv, I have used SWT

Re: problems with grayed background

2012-11-29 Thread Dmitri Silaev
this holds true if your other images do not differ much from what you've shown here in the forum. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Nov 29, 2012 at 1:07 PM, sascha4j sascha.j...@gmx.net wrote: thank you for your answer i will take a look at your example and the leptonica library

Re: Tesseract Forms Recognition,

2012-11-16 Thread Dmitri Silaev
Same here. Please share. Thanks, Dmitri On Fri, Nov 16, 2012 at 7:48 PM, Sven Pedersen sven.peder...@gmail.comwrote: I'm curious to hear about it -- I used to work in the document processing industry. Please send the info to me. Thanks, Sven On Fri, Nov 16, 2012 at 7:33 AM, José Luis Rey

Re: Tesseract filling supposedly missing character pixels – how to suppress that behavior?

2012-10-20 Thread Dmitri Silaev
. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Oct 18, 2012 at 10:43 PM, Andres andrej...@gmail.com wrote: Thank you very much Dmitri. I'll try it a little more with your hints and if I arrive to some conclusion I'll let the list know. A pair of extra questions: - Do you know

Re: another way to communicate with tesserac?

2012-10-17 Thread Dmitri Silaev
, with no memory hogging (as an answer to other forum thread), using select parts of it, though. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Oct 17, 2012 at 5:37 PM, Attila Somogyi bmfneum...@gmail.com wrote: Hello! My application processes images in a very small interval, about 1 sec

Re: Tesseract filling supposedly missing character pixels – how to suppress that behavior?

2012-10-16 Thread Dmitri Silaev
luck. Don't forget about real samples. All correspondence - please post into the forum. Warm regards, Dmitri Silaev www.CustomOCR.com On Mon, Oct 15, 2012 at 11:31 PM, Andres andrej...@gmail.com wrote: Hello fellows, Sometimes: ‘6’ is recognized as ‘8’, ‘3’ as ‘9’, and some other similar

Re: dawg extract

2012-07-09 Thread Dmitri Silaev
You can check out the Wiki article on DAWGs to see that the reverse conversion to word lists generally is not unique. Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Jul 29, 2008 at 3:49 PM, Donatas G. dgvirt...@gmail.com wrote: is it possible to extract/decompile a dawg file? I would

Re: errors when running tesseract.exe

2012-07-03 Thread Dmitri Silaev
Check if those TIFFs are uncompressed. Surprisingly, sometimes it fails with uncompressed TIFFs. Warm regards, Dmitri Silaev www.CustomOCR.com On Mon, Jul 2, 2012 at 7:02 PM, js jigij...@gmail.com wrote: downloaded code and compiled. upon running tesseract.exe with following command C

Re: errors when running tesseract.exe

2012-07-03 Thread Dmitri Silaev
Actually your second argument looks strange. It should be the basename of an output file. You've indicated C:\Users\username\Desktop\Test_images\test.tif. This can work but is this intended? Language argument can be omitted, thus defaulting to eng -- Dmitri On Mon, Jul 2, 2012 at 7:02 PM, js

Re: recognise dot matrix integers

2012-07-01 Thread Dmitri Silaev
words within the document during recognition. Then it can use them at the second (adaptive) pass. This can benefit only in case of repeated occurrences of particular words in the document. Warm regards, Dmitri Silaev www.CustomOCR.com On Mon, Jun 25, 2012 at 4:27 AM, TDG threedaygo...@gmail.com

Re: Edge detection algorithm used by tesseract

2012-06-23 Thread Dmitri Silaev
block_edges() has nothing to do with edge detection. Tesseract does not use it at all. It first binarizes entire images then extracts connected components (CCs). block_edges() is called to extract CCs' outlines from a binarized image. Warm regards, Dmitri Silaev www.CustomOCR.com On Sat, Jun 23

Re: recognise dot matrix integers

2012-06-23 Thread Dmitri Silaev
character having very distinctive shape compared to digits and preferably of the same width with digits. You've asked lots of questions but this is what I'd start working with. HTH Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Jun 20, 2012 at 9:00 AM, TDG threedaygo...@gmail.com wrote

Re: Edge detection algorithm used by tesseract

2012-06-23 Thread Dmitri Silaev
that's related to naming and notions, though. What you have shown in your image is not what is produced by extract_edges() or block_edges(). Those build completely different structures, similar to that is commonly known as crack coded CC boundaries. Warm regards, Dmitri Silaev www.CustomOCR.com

Re: Font Size support

2012-06-21 Thread Dmitri Silaev
, Dmitri Silaev www.CustomOCR.com On Thursday, June 21, 2012 7:06:24 PM UTC+4, islam ibrahim wrote: Hello I have a question regarding the font size that Tesseract supports. Is there a specific size or is it just working whatever font size or even type used? Thanks in advance -- You received

Re: ViewerDebugging

2012-06-21 Thread Dmitri Silaev
This means that the tord_display_ratings parameter no longer exists in the current version Tesseract. Probably you use outdated config files (inter or matdemo.) Try to delete corresponding line from these files. Warm regards, Dmitri Silaev www.CustomOCR.com On Thursday, June 21, 2012 1:51:52

Re: Provide/visualize baseline info?

2012-06-21 Thread Dmitri Silaev
baselines and outlines - Keep clicking words in the main image view to see their baselines For details please refer to http://rdaemons.blogspot.com/2012/06/tesseract-ocr-interactive-debugging.html Warm regards, Dmitri Silaev www.CustomOCR.com On Thursday, June 21, 2012 10:28:13 AM UTC

Re: better to train with low-quality or high-quality scans?

2012-03-08 Thread Dmitri Silaev
the speckle as a part of character's shape, and therefore it would be trained incorrectly. So the best would be to clean up the image before passing it to Tesseract. You can use ImageMagick or whatever tool you like. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Mar 7, 2012 at 9:11 PM

Re: OCR Per Page Basis

2012-03-08 Thread Dmitri Silaev
My bad, I had missed that feature. tessedit_page_number indeed allows to specify a TIFF page. I can only add a bit of clarification: the page number is zero-based. The value of -1 (default) instructs Tesseract to process all TIFF pages. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Mar

Re: OCR Per Page Basis

2012-03-08 Thread Dmitri Silaev
. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Mar 8, 2012 at 8:32 PM, Paul pafow...@googlemail.com wrote: Thank you gents that will work for me, I will give it a try. Is there somewhere I can find some documentation on things like config-page.txt etc. I have Googled it but am not finding

Re: OCR Per Page Basis

2012-03-07 Thread Dmitri Silaev
No, at this time it is not possible to do via command line. However it can be easily achieved by means of programming. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Mar 7, 2012 at 6:39 PM, Paul pafow...@googlemail.com wrote: Hi, Is there any way to instruct tesseract via the command

Re: better to train with low-quality or high-quality scans?

2012-03-07 Thread Dmitri Silaev
to resort to a dictionary or context. HTH Warm regards, Dmitri Silaev www.CustomOCR.com On Sun, Mar 4, 2012 at 10:02 PM, Falke hawk...@flight.us wrote: My subject looks deceptively like a stupid question -- but it really isn't: Supposing you need to recognize a bunch of existing scanned documents

Re: OCR Per Page Basis

2012-03-07 Thread Dmitri Silaev
Sure, you just can post a feature request in the Issues section at the project's web page. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Mar 7, 2012 at 9:42 PM, Paul pafow...@googlemail.com wrote: Thanks for the info. Do I assume then that it would be a fairly trivial task

Re: Thresholding API example

2012-02-22 Thread Dmitri Silaev
and thresholder.cpp is new and better documented so there's should be no problem to understand it after a while. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Feb 23, 2012 at 12:35 AM, avasilev alxvasi...@gmail.com wrote: First of all, I beg for excuse if this post appears twice, because I

Re: Tesseract vs Commercial Products

2012-02-19 Thread Dmitri Silaev
Jason doesn't seem to be a developer so I think these are no options for him. Otherwise the choice is limitless including 3rd party image processing libraries and of course self-made custom algorithms. Warm regards, Dmitri Silaev www.CustomOCR.com On Sun, Feb 19, 2012 at 11:41 AM, TP wing

Re: Tesseract vs Commercial Products

2012-02-18 Thread Dmitri Silaev
regards, Dmitri Silaev www.CustomOCR.com On Sat, Feb 18, 2012 at 11:43 PM, Jason Funk jasonlf...@gmail.com wrote: I am testing tesseract against some other commercial products and the commercials products seems to blow tesseract out of the water in terms of quality and accuracy. Is this because

Re: Reading image with few digits

2012-02-14 Thread Dmitri Silaev
Check this thread https://groups.google.com/forum/?fromgroups#!topic/tesseract-ocr/TY_RIHOOyNM Read about the psm switch and custom segmentation. Likely these can help you Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Feb 14, 2012 at 11:42 AM, ReneFR rspr...@veloeco.fr wrote

Re: Problem Recognizing Numbers

2012-02-13 Thread Dmitri Silaev
Did you try the psm switch (look for it in the forum)? Your own segmentation? Both combined? Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Feb 14, 2012 at 1:55 AM, John Williams jdwilliams1...@gmail.com wrote: If I duplicate the column 9 times, so that there's ten columns

Re: preprocessing

2011-12-06 Thread Dmitri Silaev
using Tesseract's classifier exclusively with highest possible efficiency. Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Nov 29, 2011 at 1:04 AM, daniel danieloberh...@googlemail.com wrote: Ok, so I thought more on this. What I will end up with is segments of possible various colors

Re: OCR very large images - smart method to split into regions first?

2011-11-21 Thread Dmitri Silaev
, particularly his Two Geometric Algorithms for Layout Analysis (2002), maybe also his Layout Analysis based on Text Line Segment Hypotheses (2003.) So you can even implement these approaches yourself using these articles. HTH Warm regards, Dmitri Silaev www.CustomOCR.com On Fri, Nov 18, 2011 at 9:26

Re: Failure on certain types of images

2011-11-16 Thread Dmitri Silaev
can happen when an image containing non-text information is fed to Tesseract; in this case all kinds of errors can arise. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Nov 16, 2011 at 3:03 AM, walter23 walte...@gmail.com wrote: I'm getting a message where the inclusion of complex

Re: preprocessing

2011-11-15 Thread Dmitri Silaev
where more than two colors are involved. I would have to map the discovered segments to two colors, which may even be impossible. And with contours even more so, as the contours may not be closed... On 12 Nov., 18:26, Dmitri Silaev daemons2...@gmail.com wrote: If you're able to use OpenCV

Re: preprocessing

2011-11-12 Thread Dmitri Silaev
If you're able to use OpenCV then, given a list of contours or blobs, you should be able to reconstruct a binary image. This is a general thought. To get a more practical advice, send us your sample image(s) Warm regards, Dmitri Silaev www.CustomOCR.com On Sat, Nov 12, 2011 at 4:37 PM, daniel

Re: Image enhancement

2011-11-09 Thread Dmitri Silaev
start your search, for example, from An Adaptive Binarization Technique for Low Quality Historical Documents, by three Greek scientists, 2004, and similar articles. Warm regards, Dmitri Silaev www.CustomOCR.com On Wed, Nov 9, 2011 at 2:49 PM, Esteban Bordón ebor...@gmail.com wrote: Hi, I send 2

  1   2   3   >