Synonym questions

Tom Hill Thu, 09 Aug 2007 11:53:52 -0700

Hi -

Just looking at synonyms, and had a couple of questions.


1) For some of my synonyms, it seems to make senses to simply replace the
original word with the other (e.g. "theatre" => "theater", so searches for
either will find either). For others, I want to add an alternate term while
preserving the original (e.g. "cirque" => "circus", so searches for "circus"
find Cirque du Soleil, but searches for "cirque" only match "cirque", not
"circus".

I was thinking that the best way to do this was with two different synonym
filters. The replace filter would be used both at index and query time, the
other only at index time.

Does doing this using two synonym filters make sense?

section from my schema.xml
    <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.StandardFilterFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
          <filter class="solr.SynonymFilterFactory"
synonyms="synonyms_replace.txt" ignoreCase="true" expand="false"
includeOrig="false"/>
          <filter class="solr.SynonymFilterFactory"
synonyms="synonyms_add.txt" ignoreCase="true" expand="false"
includeOrig="true"/>
          <filter class="solr.EnglishPorterFilterFactory" protected="
protwords.txt"/>
      </analyzer>
      <analyzer type="query">
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.StandardFilterFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
          <filter class="solr.SynonymFilterFactory"
synonyms="synonyms_replace.txt" ignoreCase="true" expand="false"
includeOrig="false"/>
          <filter class="solr.EnglishPorterFilterFactory" protected="
protwords.txt"/>
      </analyzer>
    </fieldType>

2) For this to work, I need to use "includeOrig". It appears that
"includeOrig" is hard coded to be false in SynonymFilterFactory. Is there
any reason for this? It's pretty easy to change (diff below), any reason
this should not be supported?

Thanks,

Tom

Diffing vs. my local  copy of 1.2, but it appears to be the same in HEAD.

--- src/java/org/apache/solr/analysis/SynonymFilterFactory.java
+++ src/java/org/apache/solr/analysis/SynonymFilterFactory.java (working
copy)
@@ -37,6 +37,7 @@

     ignoreCase = getBoolean("ignoreCase",false);
     expand = getBoolean("expand",true);
+    includeOrig = getBoolean("includeOrig",false);

     if (synonyms != null) {
       List<String> wlist=null;
@@ -57,8 +58,9 @@
   private SynonymMap synMap;
   private boolean ignoreCase;
   private boolean expand;
+  private boolean includeOrig;

-  private static void parseRules(List<String> rules, SynonymMap map, String
mappingSep, String synSep, boolean ignoreCase, boolean expansion) {
+  private void parseRules(List<String> rules, SynonymMap map, String
mappingSep, String synSep, boolean ignoreCase, boolean expansion) {
     int count=0;
     for (String rule : rules) {
       // To use regexes, we need an expression that specifies an odd number
of chars.
@@ -88,7 +90,6 @@
         }
       }

-      boolean includeOrig=false;
       for (List<String> fromToks : source) {
         count++;
         for (List<String> toToks : target) {

Synonym questions

Reply via email to