Re: Cleaning up dirty OCR

Robert Muir Thu, 11 Mar 2010 18:48:06 -0800

>
> I don't deal with a lot of multi-lingual stuff, but my understanding is
> that this sort of thing gets a lot easier if you can partition your docs
> by language -- and even if you can't, doing some langauge detection on the
> (dirty) OCRed text to get a language guess (and then partition by language
> and attempt to find the suspicious words in each partition)
>


and if you are really OCR'ing Urdu text and trying to search it automatically,
then this is your last priority.

-- 
Robert Muir
rcm...@gmail.com

Re: Cleaning up dirty OCR

Reply via email to