Have you looked at edismax and the 'qf' fields parameter? It allows you to
define the fields to search. Also, you can define those parameters in
solrconfig.xml and not have to send them down the wire.

Finally, you can define several different request handlers (e.g. /ensearch,
/frsearch) and have each of them use different 'qf' values, possibly with
'fl' field also defined and with field name aliasing from language-specific
to generic names.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Apr 9, 2013 at 2:32 PM, <d...@geschan.de> wrote:

>
> Hello,
>
> I'm trying to index a large number of documents in different languages.
> I don't know the language of the document, so I'm using
> TikaLanguageIdentifierUpdatePr**ocessorFactory to identify it.
>
> So, this is my configuration in solrconfig.xml
>
>  <updateRequestProcessorChain name="langid">
>    <processor class="org.apache.solr.update.**processor.**
> TikaLanguageIdentifierUpdatePr**ocessorFactory">
>          <bool name="langid">true</bool>
>          <str name="langid.fl">title,**subtitle,content</str>
>          <str name="langid.langField">**language_s</str>
>          <str name="langid.threshold">0.3</**str>
>          <str name="langid.fallback">**general</str>
>          <str name="langid.whitelist">en,fr,**de,it,es</str>
>          <bool name="langid.map">true</bool>
>          <bool name="langid.map.keepOrig">**true</bool>
>    </processor>
>    <processor class="solr.**LogUpdateProcessorFactory" />
>    <processor class="solr.**RunUpdateProcessorFactory" />
>  </updateRequestProcessorChain>
>
> So, the detection works fine and I put some dynamic fields in schema.xml
> to store the results:
>   <dynamicField name="*_en"  type="text_en"    indexed="true"
>  stored="true" multiValued="true"/>
>   <dynamicField name="*_fr"  type="text_fr"    indexed="true"
>  stored="true" multiValued="true"/>
>   <dynamicField name="*_de"  type="text_de"    indexed="true"
>  stored="true" multiValued="true"/>
>   <dynamicField name="*_it"  type="text_it"    indexed="true"
>  stored="true" multiValued="true"/>
>   <dynamicField name="*_es"  type="text_es"    indexed="true"
>  stored="true" multiValued="true"/>
>
> My main problem now is how to search the document without knowing the
> language of the searched document.
> I don't want to have a huge querystring like
>  ?q=title_en:+term+subtitle_en:**+term+title_de:+term...
> Okay, using copyField and copy all fields into the "text" field...but
> "text" has the type text_general, so the language specific indexing is not
> working. I could use at least a combined field for every language (like
> text_en, text_fr...) but still, my querystring gets very long and to add
> new languages is terribly uncomfortable.
>
> So, what can I do? Is there a better solution to index and search
> documents in many languages without knowing the language of the document
> and the query before?
>
> - Geschan
>
>

Reply via email to