wired behavior of own tokenizer

Uwe Reh Thu, 26 Apr 2018 09:57:45 -0700

Hi

I'm trying to write a own tokenizer for Solr7.


Doing this, everything seems to be fine:
- the tokenizer compiles
- the tokanizer is instanced fine by it's factory
- the tokenizer seems to do his work, when tested with the gui.
  "../solr/#/collection/analysis"

BUT
- the expected result isn't visible in the document.

Sure, I got something wrong. But I have no idea what.

Any hints are appreciated.
Uwe


###
# Snippet schema.xml
###

<analyzer type="index">
    <tokenizer class="de.hebis.solr.analysis.gvi.ClusterSynonymTokenizerFactory" 
/>
</analyzer>
<analyzer type="query">
   <tokenizer class="solr.KeywordTokenizerFactory" />
</analyzer>


###
# minimized example:
# Just replace everything with the constant string "substitute"
###

public class MyTokenizer extends Tokenizer {
   private static final Logger LOG               = 
LoggerFactory.getLogger(MyTokenizer.class);
   protected CharTermAttribute charTermAttribute = 
addAttribute(CharTermAttribute.class);
   private boolean             done              = false;

   public ClusterSynonymTokenizer() {
      super();
   }

   @Override
   public boolean incrementToken() throws IOException {
      if (done) return false;
      charTermAttribute.setEmpty();
      String toReplace = getStartOFChallange();
      LOG.info("Input: " + toReplace + " replaced.");
      charTermAttribute.append("substitute");
      done = true;
      return true;
   }

   @Override
   public void reset() throws IOException {
      super.reset();
      done = false;
   }

   /* Read some chars from 'input' */
   private String getStartOFChallange() {
      char[] buffer = new char[200];
      int inputLength = -1;
      try {
         inputLength = input.read(buffer, 0, 200);
      } catch (IOException e) {
         throw new RuntimeException(e);
      }
      if (inputLength == -1) {
         LOG.warn("No input");
         return null;
      }
      return new String(buffer, 0, inputLength);
   }
}


###
# Snippet solr.log
# The input was "ReplaceMe"
###

de.hebis.solr.analysis.MyTokenizer.incrementToken(): Input: ReplaceMe replaced.

wired behavior of own tokenizer

Reply via email to