of forms equivalent to invoices that I'd like to put into
> a database. I'm thinking I would like to have some OCR app/tool scan these
> forms, and then generate a CSV with each field. Does anyone have
> recommendations on software for this?
I have thousands of forms equivalent to invoices that I'd like to put into
a database. I'm thinking I would like to have some OCR app/tool scan these
forms, and then generate a CSV with each field. Does anyone have
recommendations on software for this?
--
Adam
On Thu, Jan 29, 2009 at 7:11 AM, Reko Turja wrote:
>
> --
> From: "Gary Kline"
> Sent: Thursday, January 29, 2009 4:23 AM
> To: "Andrew Gould"
> Cc: "Reko Turja" ; "FreeBSD Mailing List&q
--
From: "Gary Kline"
Sent: Thursday, January 29, 2009 4:23 AM
To: "Andrew Gould"
Cc: "Reko Turja" ; "FreeBSD Mailing List"
Subject: Re: OCR...
On Wed, Jan 28, 2009 at 07:33:41PM -0600, Andrew Goul
o with. gOCR
> > > > > > >looks
> > > > > > >best so far to me.
> > > > > >
> > > > > > AABBYY Finereader - Omnipage haven't been able to catch it in
> several
> > >
gt; > > AABBYY Finereader - Omnipage haven't been able to catch it in several
> > > > > years either feature or qualitywise. No idea if Finereader runs under
> > > > > emulator though. If the file is already a PDF and 72 DPI with text
> > as
> > &
itywise. No idea if Finereader runs under
> > > > emulator though. If the file is already a PDF and 72 DPI with text
> as
> > > > graphics most of the damage has already been done, and it will be
> > > > extremely hard to OCR.
> > > >
> >
t; > >best so far to me.
> > >
> > > AABBYY Finereader - Omnipage haven't been able to catch it in several
> > > years either feature or qualitywise. No idea if Finereader runs under
> > > emulator though. If the file is already a PDF and 72 DPI with tex
it in several
> > years either feature or qualitywise. No idea if Finereader runs under
> > emulator though. If the file is already a PDF and 72 DPI with text as
> > graphics most of the damage has already been done, and it will be
> > extremely hard to OCR.
> >
>
or though. If the file is already a PDF and 72 DPI with text as
> graphics most of the damage has already been done, and it will be
> extremely hard to OCR.
>
well, damage is probably done. how can i check the resolution?
i tried to increase it by creating huge ppm and
ly hard to OCR.
-Reko
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Gary Kline wrote:
> well, i'm ashamed to admit that i've put at least a dozen hours in
> trying, then re-re-retrying to OCR a imaged pdf file with as many
> open source ocr packages as i can find.
I have seen good results with tesseract which is in the ports and free.
Othe
guys,
well, i'm ashamed to admit that i've put at least a dozen hours in
trying, then re-re-retrying to OCR a imaged pdf file with as many
open source ocr packages as i can find. before i quit for supper
tonight, i finally threw in the towel. realized than i would have
been THROUG
>
> I wrote some code using Python PDF library 'pypdf' to split a multipage
> PDF scan into individual pages, then used the tesseract OCR to convert
> to text. Not 100% of course, and it really got confused by pages that
> were not right-side-up, but not a bad start fo
On Tue, Dec 02, 2008 at 02:07:30AM +0100, Roland Smith wrote:
> On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote:
> > pdftotext fail on the large [32MB] file I've got. Is there any
> > other way I can translate this huge textfile to ascii or html or
> > text?
>
> Please defi
; > there is no text for pdftotext to convert => epic fail.
>
> In this case "convert" from the ImageMagick port will get you a
> series of .jpg/.gif/.. Read the manual carefully before
> attempting; also note this can be a slow process.
Which still doesn't give pla
> > 1) Some PDFs are just wrappers around JPEG images. In this case
> > there is no text for pdftotext to convert => epic fail.
>
> In this case "convert" from the ImageMagick port will get you a
> series of .jpg/.gif/.. Read the manual carefully before
> attempting; also note this can be
Roland Smith writes:
> >pdftotext fail on the large [32MB] file I've got. Is there any
> >other way I can translate this huge textfile to ascii or html or
> >text?
>
> Please define "fail" in this context? I've used pdftotxt on
> documents exceeding 40MB. However there are of
On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote:
> pdftotext fail on the large [32MB] file I've got. Is there any
> other way I can translate this huge textfile to ascii or html or
> text?
Please define "fail" in this context? I've used pdftotxt on documents
exceeding
Guys,
pdftotext fail on the large [32MB] file I've got. Is there any other
way I
can translate this huge textfile to ascii or html or text?
thanks,
gary
--
Gary Kline [EMAIL PROTECTED] http://www.thought.org Public Service Unix
http://jot
nt and out-of-copyright
> > > book (from 1913) and need to know what the best scanner is
> > > and if there has been substantial improvement in OCR
> > > software in recent years. This book has few footnotes
> > > or differen
n hold the page flat while
it's been photographed, with something to keep the opposite page out of the
camera's way.
I have to admit that I do all my scanning and OCR on an OS X system, only
marginally related to FreeBSD. I use an older HP Scanjet with automatic
document feeder (ADF), an
fine.
> > and if there has been substantial improvement in OCR
> > software in recent years. This book has few footnotes
> > or different typefaces, so it should make things easier.
There are several free OCR programs. I've used gocr
(http://joc
to know what the best scanner is
> > and if there has been substantial improvement in OCR
> > software in recent years. This book has few footnotes
> > or different typefaces, so it should make things easier.
> >
> > Oh, an if there is somet
canner is
> >and if there has been substantial improvement in OCR
> >software in recent years. This book has few footnotes
> >or different typefaces, so it should make things easier.
> >
> >Oh, an if there is something that plugs into DOS
At 08:07 PM 9/1/2005 -0700, Gary Kline wrote:
People,
I want to scan ~400 pp of an out-of-print and out-of-copyright
book (from 1913) and need to know what the best scanner is
and if there has been substantial improvement in OCR
software in recent years
On 9/1/05, Gary Kline <[EMAIL PROTECTED]> wrote:
> People,
>
> I want to scan ~400 pp of an out-of-print and out-of-copyright
> book (from 1913) and need to know what the best scanner is
> and if there has been substantial improvement in OCR
On Thu, Sep 01, 2005 at 08:07:26PM -0700, Gary Kline wrote:
> People,
>
> I want to scan ~400 pp of an out-of-print and out-of-copyright
> book (from 1913) and need to know what the best scanner is
> and if there has been substantial imp
People,
I want to scan ~400 pp of an out-of-print and out-of-copyright
book (from 1913) and need to know what the best scanner is
and if there has been substantial improvement in OCR
software in recent years. This book has few footnotes
or
29 matches
Mail list logo