Hi I'm trying to write a own tokenizer for Solr7.
Doing this, everything seems to be fine: - the tokenizer compiles - the tokanizer is instanced fine by it's factory - the tokenizer seems to do his work, when tested with the gui. "../solr/#/collection/analysis" BUT - the expected result isn't visible in the document. Sure, I got something wrong. But I have no idea what. Any hints are appreciated. Uwe ### # Snippet schema.xml ###
<analyzer type="index"> <tokenizer class="de.hebis.solr.analysis.gvi.ClusterSynonymTokenizerFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory" /> </analyzer>
### # minimized example: # Just replace everything with the constant string "substitute" ###
public class MyTokenizer extends Tokenizer { private static final Logger LOG = LoggerFactory.getLogger(MyTokenizer.class); protected CharTermAttribute charTermAttribute = addAttribute(CharTermAttribute.class); private boolean done = false; public ClusterSynonymTokenizer() { super(); } @Override public boolean incrementToken() throws IOException { if (done) return false; charTermAttribute.setEmpty(); String toReplace = getStartOFChallange(); LOG.info("Input: " + toReplace + " replaced."); charTermAttribute.append("substitute"); done = true; return true; } @Override public void reset() throws IOException { super.reset(); done = false; } /* Read some chars from 'input' */ private String getStartOFChallange() { char[] buffer = new char[200]; int inputLength = -1; try { inputLength = input.read(buffer, 0, 200); } catch (IOException e) { throw new RuntimeException(e); } if (inputLength == -1) { LOG.warn("No input"); return null; } return new String(buffer, 0, inputLength); } }
### # Snippet solr.log # The input was "ReplaceMe" ###
de.hebis.solr.analysis.MyTokenizer.incrementToken(): Input: ReplaceMe replaced.