org
Sent: Tuesday, 29 January, 2008 8:00:56 AM
Subject: Re: Lucene to index OCR text
Op
Tuesday
29
January
2008
03:32:08
schreef
Daniel
Noll:
>
On
Friday
25
January
2008
19:26:44
Paul
Elschot
wrote:
>
>
There
is
no
way
to
do
exact
phrase
matching
on
OCR
data,
be
Op Tuesday 29 January 2008 03:32:08 schreef Daniel Noll:
> On Friday 25 January 2008 19:26:44 Paul Elschot wrote:
> > There is no way to do exact phrase matching on OCR data, because no
> > correction of OCR data will be perfect. Otherwise the OCR would have made
> > the correction...
> >
>
> The
On Friday 25 January 2008 19:26:44 Paul Elschot wrote:
> There is no way to do exact phrase matching on OCR data, because no
> correction of OCR data will be perfect. Otherwise the OCR would have made
> the correction...
>
The problem I see with a fuzzy query is that if you have the fuzziness set
PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, January 25, 2008 7:31 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene to index OCR text
Thanks everyone for their ideas and suggestions! Some had occurred to us but
were discarded because we feel our solution needs to be automated --
45 million
Thanks everyone for their ideas and suggestions! Some had occurred to us
but were discarded because we feel our solution needs to be automated --
45 million pages are a lot of thrust on any human-driven effort.
I like Itamar's idea of doing "competing" OCR, and keeping the best
result. Unfortunate
enaud Waldura <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Friday, 25 January, 2008 1:43:06 AM
> Subject: Lucene to index OCR text
>
> I've
> been
> poking
> around
> the
> list
> archives
> and
> didn't
> really
> come
> up
&g
Lucene to index OCR text
I've
been
poking
around
the
list
archives
and
didn't
really
come
up
against
anything
interesting.
Anyone
using
Lucene
to
index
OCR
text?
Any
strategies/algorithms/packages
you
recommend?
I
have
a
large
collection
(10^7
docs)
tha
re
too big?
Itamar.
-Original Message-
From: Paul Elschot [mailto:[EMAIL PROTECTED]
Sent: Friday, January 25, 2008 10:27 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene to index OCR text
Op Friday 25 January 2008 03:46:23 schreef Kyle Maxwell:
> > I've been poking around
Op Friday 25 January 2008 03:46:23 schreef Kyle Maxwell:
> > I've been poking around the list archives and didn't really come up against
> > anything interesting. Anyone using Lucene to index OCR text? Any
> > strategies/algorithms/packages you recommend?
> >
&
> I've been poking around the list archives and didn't really come up against
> anything interesting. Anyone using Lucene to index OCR text? Any
> strategies/algorithms/packages you recommend?
>
> I have a large collection (10^7 docs) that's mostly the result of OC
ng you come across. Especially in
the way
of cleaning existing OCRd data. Mostly, I'm expressing sympathy for the size
and complexity of the task you're undertaking ..
Best
Erick
On Jan 24, 2008 8:43 PM, Renaud Waldura <[EMAIL PROTECTED]>
wrote:
> I've been poking arou
I've been poking around the list archives and didn't really come up against
anything interesting. Anyone using Lucene to index OCR text? Any
strategies/algorithms/packages you recommend?
I have a large collection (10^7 docs) that's mostly the result of OCR. We
index/search/e
12 matches
Mail list logo