> > I don't deal with a lot of multi-lingual stuff, but my understanding is > that this sort of thing gets a lot easier if you can partition your docs > by language -- and even if you can't, doing some langauge detection on the > (dirty) OCRed text to get a language guess (and then partition by language > and attempt to find the suspicious words in each partition) >
and if you are really OCR'ing Urdu text and trying to search it automatically, then this is your last priority. -- Robert Muir rcm...@gmail.com