I haven't tried this as an UpdateProcessor but it relies on Tika and that LanguageIdentifier works well, except for short texts.
> Thanks Markus. > > Do you know if this patch is good enough for production use? Thanks. > > Andy > > --- On Tue, 3/29/11, Markus Jelsma <markus.jel...@openindex.io> wrote: > > From: Markus Jelsma <markus.jel...@openindex.io> > > Subject: Re: copyField at search time / multi-language support > > To: solr-user@lucene.apache.org > > Cc: "Andy" <angelf...@yahoo.com> > > Date: Tuesday, March 29, 2011, 1:29 AM > > https://issues.apache.org/jira/browse/SOLR-1979 > > > > > Tom, > > > > > > Could you share the method you use to perform language > > > > detection? Any open > > > > > source tools that do that? > > > > > > Thanks. > > > > > > --- On Mon, 3/28/11, Tom Mortimer <t...@flax.co.uk> > > > > wrote: > > > > From: Tom Mortimer <t...@flax.co.uk> > > > > Subject: copyField at search time / > > > > multi-language support > > > > > > To: solr-user@lucene.apache.org > > > > Date: Monday, March 28, 2011, 4:45 AM > > > > Hi, > > > > > > > > Here's my problem: I'm indexing a corpus with > > > > text in a > > > > > > variety of > > > > languages. I'm planning to detect these at index > > > > time and > > > > > > send the > > > > text to one of a suitably-configured field (e.g. > > > > "mytext_de" for > > > > German, "mytext_cjk" for Chinese/Japanese/Korean > > > > etc.) > > > > > > At search time I want to search all of these > > > > fields. > > > > > > However, there > > > > will be at least 12 of them, which could lead to > > > > a very > > > > > > long query > > > > string. (Also I need to use the standard query > > > > parser > > > > > > rather than > > > > dismax, for full query syntax.) > > > > > > > > Therefore I was wondering if there was a way to > > > > copy fields > > > > > > at search > > > > time, so I can have my mytext query in a single > > > > field and > > > > > > have it > > > > copied to mytext_de, mytext_cjk etc. Something > > > > like: > > > > <copyQueryField source="mytext" > > > > > > > > dest="mytext_de" /> > > > > > > > > <copyQueryField source="mytext" > > > > > > > > dest="mytext_cjk" /> > > > > > > > > ... > > > > > > > > If this is not currently possible, could someone > > > > give me > > > > > > some pointers > > > > for hacking Solr to support it? Should I > > > > subclass > > > > > > solr.SearchHandler? > > > > I know nothing about Solr internals at the > > > > moment... > > > > > > thanks, > > > > Tom