RE: Mutli term synonyms

Davis, Daniel (NIH/NLM) [C] Mon, 20 Apr 2015 08:06:11 -0700

Handling MESH descriptor preferred terms and such is similar.   I encountered 
this during evaluation of Solr for a project here at NLM.   We decided to use 
Solr for different projects instead.     I considered the following approaches:
 - use a custom tokenizer at index time that indexed all of the multiple term 
alternatives.   
 - index the data, and then have an enrichment process that queries on each 
source synonym, and generates an update to add the target synonyms.  
   Follow this with an optimize.
 - During the indexing process, but before sending the data to Solr, process 
the data to tokenize and add synonyms to another field.


Both the custom tokenizer and enrichment process share the feature that they 
use Solr's own tokenizer rather than duplicate it.   The enrichment process 
seems to me only workable in environments where you can re-index all data 
periodically, so no continuous stream of data to index that needs to be handled 
relatively quickly once it is generated.    The last method of pre-processing 
the data seems the least desirable to me from a blue-sky perspective, but is 
probably the easiest to implement and the most independent of Solr.

Hope this helps,

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

-----Original Message-----
From: Kaushik [mailto:kaushika...@gmail.com] 
Sent: Monday, April 20, 2015 10:47 AM
To: solr-user@lucene.apache.org
Subject: Mutli term synonyms

Hello,

Reading up on synonyms it looks like there is no real solution for multi term 
synonyms. Is that right? I have a use case where I need to map one multi term 
phrase to another. i.e. Tween 20 needs to be translated to Polysorbate 40.

Any thoughts as to how this can be achieved?

Thanks,
Kaushik

RE: Mutli term synonyms

Reply via email to