Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
As I can see, your source data can be deemed as 1-bit (binary) losslessly compressed image. So a lossless conversion to any image format (makes no difference which) will do no harm. Warm regards, Dmitry Silaev On Tue, Mar 15, 2011 at 8:31 AM, David Hoffer wrote: > Dmitry, > > Originally the

Re: Especial Characteres

2011-03-14 Thread Dmitry Silaev
I doubt there's a GUI which can help with what you want. As for programmatic way of doing this, please refer to the following thread where I already tried to answer a similar question: http://groups.google.com/group/tesseract-ocr/browse_thread/thread/6322a29f28ba49dc/f98699a9caf36dbc#f98699a9caf36d

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
Dave, What is the format and resolution in which you initially get your images? For such poor quality every conversion makes an image even worse... Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 5:29 PM, David Hoffer wrote: > Dmitry, > > Would using a loss-less format like TIFF be pref

c++ OCR a smaller area

2011-03-14 Thread meb111
I want to scan images from a game and what im scanning is in a weird format. Im using visual c++ 2010, and would like to know if i can just OCR a certain area, also i would like to know exactly how to use it in vc++ as im not exactly 'pro'-grammer. Any help is much appreciated. -- You received th

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread David Hoffer
Dmitry, Would using a loss-less format like TIFF be preferred? (I'm going to give this a try but some of these steps might be a bit more than I can handle...I'm not an image processing guru.) -Dave On Mon, Mar 14, 2011 at 5:23 PM, Dmitry Silaev wrote: > Ehmm, actually I thought a bit more and

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
Ehmm, actually I thought a bit more and now I say no to deskewing. It can be detrimental to such poor quality images - they are almost binary ("almost" probably because of the JPEG compression algo) and low-res. As far as I see, you only can have binary images. Therefore we need to assume a skew o

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
Well, if I was faced with such a problem, I'd do the following: 1. Deskew 2. Cut out excess whitespace using hor/ver projection profile 3. Determine aspect ratio (AR) 4. Based on AR determine location of significant areas (columns with numbers, much the same method for other areas in the header) 5

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
Dave, Yep, quality is relatively poor so don't expect high accuracy from Tess. Do you need every table cell's contents? Or getting numbers is just enough and in a next step you can restore [predefined] item names? Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 4:19 PM, David Hoffer wr

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
In future that will be my desired approach! for the time beeing I just need a fast and easy solution! I know it's not the most beautiful approach... but I haven't touch a lot of the tesseract framework in order to break anything! I was just short of time and it was easier for me to modify the sourc

Re: Customising Tesseract for character recognition

2011-03-14 Thread Dmitry Silaev
Why don't you consider making your own project and statically include in it Tesseract, or use Tesseract as a dynamic link library? In that way you can implement any formating and other special logic you wish... Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 2:13 PM, Jose wrote: > I fire

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
I fire the execution of the tesseract in the command line and I didn't find a way to format the results with more info. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsu

Re: Customising Tesseract for character recognition

2011-03-14 Thread Dmitry Silaev
Ehmm... I don't get it. If you've succeeded in using iterators, it's at your full disposal to format the output in any way you want programmatically, isn't it? Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 1:56 PM, Jose wrote: > *I only modify how the result is printed! nothing else...

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
*I only modify how the result is printed! nothing else... I grab all the info from the word and it's bounding box! that is ok right? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
yes, I got the information from the result! I only modify has the result method prints the result.. nothing more of course! I got the information from the bounding box of the result! I'm not modifying it deeper than that. -- You received this message because you are subscribed to the Google Group

Re: Customising Tesseract for character recognition

2011-03-14 Thread Dmitry Silaev
I think the best approach would be to stay as far as possible from modifying the 3rd party code. Take a closer look to ResultIterator and PageIterator classes. Often they suffice for getting all information you need about Tess's recognition results. Warm regards, Dmitry Silaev On Mon, Mar 14,

Re: Tesseract 3.00

2011-03-14 Thread Dmitry Silaev
You don't need to bother using *two together*. Tesseract is a basis FreeOCR is built on, so these two are together already. FreeOCR's graphic interface is quite user friendly. Just install and use. I don't know what else needs to be said )) Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
Hi Dmitry, thanks for the help! and the end what I did is modify the return result function and include the top location of the the bounding box. then I have the following result: xy x1y1 x2y2 x3y3 x4y4 x5y5 x6y6 x7y7 then I parse

Re: Tesseract 3.00

2011-03-14 Thread Onion
I have FreeOCR installed already. So somehow, this works with Tesseract? Can you explain in simpleton terms how I'd use the two together? Or is it too "geeky"? Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send

Re: Tesseract 3.00

2011-03-14 Thread Dmitry Silaev
Actually, there's more than just VietOCR. Check this: http://en.wikipedia.org/wiki/Tesseract_(software)#User_interfaces Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 2:13 AM, Onion wrote: > Ok, thanks. That will be too complicated for me to use. Will have to > uninstall it. > > -- > Y

Re: how to get the character in an image file which is in table format.

2011-03-14 Thread Dmitry Silaev
I suspect, this paper is a sledgehammer for a nut. It's quite universal and elaborated. Usually it may take a great deal of time to implement and debug it. Your images might require much simplier methods. I always say the same thing: send your sample images and the community will try to help. War

Re: Especial Characteres

2011-03-14 Thread Dmitry Silaev
Manuel, I'm afraid just chaining command line tools won't help in this case. I'm talking about programming. And yes, I did solve many practical problems related to layout analysis, and other fields of document image processing, and succeeded in it )) Warm regards, Dmitry Silaev On Mon, Mar