subject:"Lucene to index OCR text"

Re: Lucene to index OCR text

2008-01-29 Thread mark harwood

org Sent: Tuesday, 29 January, 2008 8:00:56 AM Subject: Re: Lucene to index OCR text Op Tuesday 29 January 2008 03:32:08 schreef Daniel Noll: > On Friday 25 January 2008 19:26:44 Paul Elschot wrote: > > There is no way to do exact phrase matching on OCR data, be

Re: Lucene to index OCR text

2008-01-29 Thread Paul Elschot

Op Tuesday 29 January 2008 03:32:08 schreef Daniel Noll: > On Friday 25 January 2008 19:26:44 Paul Elschot wrote: > > There is no way to do exact phrase matching on OCR data, because no > > correction of OCR data will be perfect. Otherwise the OCR would have made > > the correction... > > > > The

Re: Lucene to index OCR text

2008-01-28 Thread Daniel Noll

On Friday 25 January 2008 19:26:44 Paul Elschot wrote: > There is no way to do exact phrase matching on OCR data, because no > correction of OCR data will be perfect. Otherwise the OCR would have made > the correction... > The problem I see with a fuzzy query is that if you have the fuzziness set

RE: Lucene to index OCR text

2008-01-25 Thread Renaud Waldura

PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 7:31 AM To: java-user@lucene.apache.org Subject: Re: Lucene to index OCR text Thanks everyone for their ideas and suggestions! Some had occurred to us but were discarded because we feel our solution needs to be automated -- 45 million

Re: Lucene to index OCR text

2008-01-25 Thread waldura

Thanks everyone for their ideas and suggestions! Some had occurred to us but were discarded because we feel our solution needs to be automated -- 45 million pages are a lot of thrust on any human-driven effort. I like Itamar's idea of doing "competing" OCR, and keeping the best result. Unfortunate

Re: Lucene to index OCR text

2008-01-25 Thread Erick Erickson

enaud Waldura <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Friday, 25 January, 2008 1:43:06 AM > Subject: Lucene to index OCR text > > I've > been > poking > around > the > list > archives > and > didn't > really > come > up &g

Re: Lucene to index OCR text

2008-01-25 Thread mark harwood

Lucene to index OCR text I've been poking around the list archives and didn't really come up against anything interesting. Anyone using Lucene to index OCR text? Any strategies/algorithms/packages you recommend? I have a large collection (10^7 docs) tha

RE: Lucene to index OCR text

2008-01-25 Thread Itamar Syn-Hershko

re too big? Itamar. -Original Message- From: Paul Elschot [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 10:27 AM To: java-user@lucene.apache.org Subject: Re: Lucene to index OCR text Op Friday 25 January 2008 03:46:23 schreef Kyle Maxwell: > > I've been poking around

Re: Lucene to index OCR text

2008-01-25 Thread Paul Elschot

Op Friday 25 January 2008 03:46:23 schreef Kyle Maxwell: > > I've been poking around the list archives and didn't really come up against > > anything interesting. Anyone using Lucene to index OCR text? Any > > strategies/algorithms/packages you recommend? > > &

Re: Lucene to index OCR text

2008-01-24 Thread Kyle Maxwell

> I've been poking around the list archives and didn't really come up against > anything interesting. Anyone using Lucene to index OCR text? Any > strategies/algorithms/packages you recommend? > > I have a large collection (10^7 docs) that's mostly the result of OC

Re: Lucene to index OCR text

2008-01-24 Thread Erick Erickson

ng you come across. Especially in the way of cleaning existing OCRd data. Mostly, I'm expressing sympathy for the size and complexity of the task you're undertaking .. Best Erick On Jan 24, 2008 8:43 PM, Renaud Waldura <[EMAIL PROTECTED]> wrote: > I've been poking arou

Lucene to index OCR text

2008-01-24 Thread Renaud Waldura

I've been poking around the list archives and didn't really come up against anything interesting. Anyone using Lucene to index OCR text? Any strategies/algorithms/packages you recommend? I have a large collection (10^7 docs) that's mostly the result of OCR. We index/search/e

Re: Lucene to index OCR text

Re: Lucene to index OCR text

Re: Lucene to index OCR text

RE: Lucene to index OCR text

Re: Lucene to index OCR text

Re: Lucene to index OCR text

Re: Lucene to index OCR text

RE: Lucene to index OCR text

Re: Lucene to index OCR text

Re: Lucene to index OCR text

Re: Lucene to index OCR text

Lucene to index OCR text

12 matches

Site Navigation

Mail list logo

Footer information