This may not be a practically solvable problem, but the company I work for
has a large number of lengthy mixed-language documents - for example,
scholarly articles about Islam written in English but containing lengthy
passages of Arabic. Ideally, we would like users to be able to search both
the English and Arabic portions of the text, using the full complement of
language-processing tools such as stemming and stopword removal.

The problem, of course, is that these two languages co-occur in the same
field. Is there any way to apply different processing to different words or
paragraphs within a single field through language detection? Is this to all
intents and purposes impossible within Solr? Or is another approach (using
language detection to split the single large field into
language-differentiated smaller fields, for example) possible/recommended?

Thanks,

Tim Hill

Reply via email to