>
> I don't deal with a lot of multi-lingual stuff, but my understanding is
> that this sort of thing gets a lot easier if you can partition your docs
> by language -- and even if you can't, doing some langauge detection on the
> (dirty) OCRed text to get a language guess (and then partition by language
> and attempt to find the suspicious words in each partition)
>

and if you are really OCR'ing Urdu text and trying to search it automatically,
then this is your last priority.

-- 
Robert Muir
rcm...@gmail.com

Reply via email to