As I can see, your source data can be deemed as 1-bit (binary)
losslessly compressed image. So a lossless conversion to any image
format (makes no difference which) will do no harm.
Warm regards,
Dmitry Silaev
On Tue, Mar 15, 2011 at 8:31 AM, David Hoffer wrote:
> Dmitry,
>
> Originally the
I doubt there's a GUI which can help with what you want. As for
programmatic way of doing this, please refer to the following thread
where I already tried to answer a similar question:
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/6322a29f28ba49dc/f98699a9caf36dbc#f98699a9caf36d
Dave,
What is the format and resolution in which you initially get your
images? For such poor quality every conversion makes an image even
worse...
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at 5:29 PM, David Hoffer wrote:
> Dmitry,
>
> Would using a loss-less format like TIFF be pref
I want to scan images from a game and what im scanning is in a weird
format.
Im using visual c++ 2010, and would like to know if i can just OCR a
certain area, also i would like to know exactly how to use it in vc++
as im not exactly 'pro'-grammer. Any help is much appreciated.
--
You received th
Dmitry,
Would using a loss-less format like TIFF be preferred?
(I'm going to give this a try but some of these steps might be a bit
more than I can handle...I'm not an image processing guru.)
-Dave
On Mon, Mar 14, 2011 at 5:23 PM, Dmitry Silaev wrote:
> Ehmm, actually I thought a bit more and
Ehmm, actually I thought a bit more and now I say no to deskewing. It
can be detrimental to such poor quality images - they are almost
binary ("almost" probably because of the JPEG compression algo) and
low-res. As far as I see, you only can have binary images.
Therefore we need to assume a skew o
Well, if I was faced with such a problem, I'd do the following:
1. Deskew
2. Cut out excess whitespace using hor/ver projection profile
3. Determine aspect ratio (AR)
4. Based on AR determine location of significant areas (columns with
numbers, much the same method for other areas in the header)
5
Dave,
Yep, quality is relatively poor so don't expect high accuracy from Tess.
Do you need every table cell's contents? Or getting numbers is just
enough and in a next step you can restore [predefined] item names?
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at 4:19 PM, David Hoffer wr
In future that will be my desired approach! for the time beeing I just need
a fast and easy solution! I know it's not the most beautiful approach... but
I haven't touch a lot of the tesseract framework in order to break anything!
I was just short of time and it was easier for me to modify the sourc
Why don't you consider making your own project and statically include
in it Tesseract, or use Tesseract as a dynamic link library? In that
way you can implement any formating and other special logic you
wish...
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at 2:13 PM, Jose wrote:
> I fire
I fire the execution of the tesseract in the command line and I didn't find
a way to format the results with more info.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsu
Ehmm... I don't get it. If you've succeeded in using iterators, it's
at your full disposal to format the output in any way you want
programmatically, isn't it?
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at 1:56 PM, Jose wrote:
> *I only modify how the result is printed! nothing else...
*I only modify how the result is printed! nothing else... I grab all the
info from the word and it's bounding box! that is ok right?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups
yes, I got the information from the result! I only modify has the result
method prints the result.. nothing more of course! I got the information
from the bounding box of the result! I'm not modifying it deeper than that.
--
You received this message because you are subscribed to the Google Group
I think the best approach would be to stay as far as possible from
modifying the 3rd party code. Take a closer look to ResultIterator and
PageIterator classes. Often they suffice for getting all information
you need about Tess's recognition results.
Warm regards,
Dmitry Silaev
On Mon, Mar 14,
You don't need to bother using *two together*. Tesseract is a basis
FreeOCR is built on, so these two are together already. FreeOCR's
graphic interface is quite user friendly. Just install and use. I
don't know what else needs to be said ))
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at
Hi Dmitry,
thanks for the help!
and the end what I did is modify the return result function and include the
top location of the the bounding box. then I have the following result:
xy
x1y1
x2y2
x3y3
x4y4
x5y5
x6y6
x7y7
then I parse
I have FreeOCR installed already. So somehow, this works with Tesseract? Can
you explain in simpleton terms how I'd use the two together? Or is it too
"geeky"?
Thanks
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send
Actually, there's more than just VietOCR. Check this:
http://en.wikipedia.org/wiki/Tesseract_(software)#User_interfaces
Warm regards,
Dmitry Silaev
On Mon, Mar 14, 2011 at 2:13 AM, Onion wrote:
> Ok, thanks. That will be too complicated for me to use. Will have to
> uninstall it.
>
> --
> Y
I suspect, this paper is a sledgehammer for a nut. It's quite
universal and elaborated. Usually it may take a great deal of time to
implement and debug it. Your images might require much simplier
methods.
I always say the same thing: send your sample images and the community
will try to help.
War
Manuel,
I'm afraid just chaining command line tools won't help in this case.
I'm talking about programming.
And yes, I did solve many practical problems related to layout
analysis, and other fields of document image processing, and succeeded
in it ))
Warm regards,
Dmitry Silaev
On Mon, Mar
21 matches
Mail list logo