[tesseract-ocr] Links for sample TIF image and BOX files are broken

2018-02-12 Thread Iyvin Jose
in the Tesseract wiki page for creating box files for Version 4 - https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0 the two links showing sample - BOX file - TIF image <

[tesseract-ocr] Need help improving text recognition in scanned documents

2017-01-01 Thread Jose Luis Abuelo
We are using ImageMagic and tesseract to try to read information in documents, but we are not finding the right configuration and combination of both softwares to optimize the original scanned tif document, and apply tesseract to it to obtain the information. First we use to scan the documen

OCR romanized Asian languages

2013-08-28 Thread JOSE MARIA GARCIA NAÑEZ
Hi y'all! I have some resources, mainly linguistics stuff, entirely written in pinyin -therefore no hanzi whatsoever ; I've tried to OCR the data with commercial software such as Abby , Acrobat, etc but no luck. The problem arises from the following set of characters { o ā ɑ̄ ē ī ō ū ǖ Ā Ē Ī Ō Ū

Re: New training only recognize if >3 chars

2012-03-22 Thread Jose Garcia
y I use it. Tesseract.exe with the default options works ok but tesseractdotnet does not work well with the same image that contains only 3 digits. Thanks for your help. On Mar 22, 8:55 am, zdenko podobny wrote: > On Wed, Mar 21, 2012 at 7:22 PM, Jose Garcia wrote: > > Hello, > &g

New training only recognize if >3 chars

2012-03-21 Thread Jose Garcia
Hello, I've trained tesseract with only this characters: 0123456789-. I used one tiff with this characters, with 6 samples of each. After the successfully training, tesseract only recognize if in the input tiff there are more than 3 numbers. With a tiff with, for example this numbers 65 it retur

Re: Use tesseract-ocr in android

2011-03-15 Thread Jose
On what device are you going to use it? I tried using it in an IPhone 3gs and it was taking 3 minutes to process a picture! Is that normal? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googl

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
modify the source code. As I just modified a function I could go back to normal and implement that outside the tesseract framework. thank you very much for the help! regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
I fire the execution of the tesseract in the command line and I didn't find a way to format the results with more info. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsu

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
*I only modify how the result is printed! nothing else... I grab all the info from the word and it's bounding box! that is ok right? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
yes, I got the information from the result! I only modify has the result method prints the result.. nothing more of course! I got the information from the bounding box of the result! I'm not modifying it deeper than that. -- You received this message because you are subscribed to the Google Group

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
the results and I can now that x1 and x2 where in the same line due looking at the top value. the approach works fine to me but I had to modify the sourcecode of tesseract regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr&q

Re: Customising Tesseract for character recognition

2011-03-13 Thread Jose
Hi Patrick, yes the results are correct! but the format of the results it is not! that's my trouble -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this g

Re: Customising Tesseract for character recognition

2011-03-13 Thread Jose
Hi Dmitry, sorry for the delay... I produced some samples and see if you can give them a look! regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To u

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
Ok I'll try to do that this afternoon. thank you for the help regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
s all the results after, please correct me if I'm wrong or you see some improvements that can be made. please excuse my bad english regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
Dmitry the recognition works the only thing is the way it is parsing it... :S I think segmentation of the images would be too much painful! I only won't to change the other that is display or the bounding boxes so I could now the x and y of the word recognized and thereby can organise the results b

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
Hi, (as you now Saurabh because we talked in private the other day) I tried the PSM_SINGLE_COLUMN and the accuracy drops dramatically... I can't afford to loose that accuracy. Is it possible to change the way the output is display? Looking a the code it seems rather hard to change it... perhaps I c

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose Granja
Hi, do you now how to force the page layout to recognise it as horizontal? my issue is with that! you ll make me the happiest person on earth On 17 Feb 2011, at 04:48, Saurabh Gandhi wrote: > Hello everyone, > > I am currently using tesseract 3.x for license plate recognition. > I have an algo

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
Saurabh by setting on this: PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR are you forcing the page to read horizontally? My problem is that I have a column of two words separated by a white space (each word is in a diferent font) and Instead of seeing one column of two words the OCR sees two columns of one w

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
Ok I'm recompiling now... I'll let you know when it's done! thanks for the help anyway :) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
you now Saurabh, that was EXACTLY was I was looking for! I couldn't be more thankful to you! that line of code changed my life :D thank you again :) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
this is JPG look like *WORD1 * WORD2 (white space is quite "big" *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 and it reads like: *WORD1 * *WORD1 * *WORD1 * *WORD1 * *WORD1 * *WORD1* WORD2 WORD2 WORD2 WORD2 WORD2 WORD2 WORD2 any help would be r

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
ok I'll try that! I have to modify this on the tesseractmain.cpp right? (I'm using command line execution) I replace this line : api.SetPageSegMode(tesseract::PSM_AUTO); for api.SetPageSegMode(tesseract::PSM_SINGLE_COLUMN); and then recompile right? thanks for the help -- You received this mess

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
Is there no other workarround? If I reduce the white space size of the WORD1 WORD2 then it all works fine! This space is making the OCR think it's another column! Is there no another way? Splitting the image as many rows looks something not really eficient -- You received this message because you