Is there an issue with hypens in SpellChecker with StandardTokenizer?

Brandon Fish Thu, 15 Dec 2011 12:08:12 -0800

I am getting an error using the SpellChecker component with the query
"another-test"
java.lang.StringIndexOutOfBoundsException: String index out of range: -7


This appears to be related to this
issue<https://issues.apache.org/jira/browse/SOLR-1630> which
has been marked as fixed. My configuration and test case that follows
appear to reproduce the error I am seeing. Both "another" and "test" get
changed into tokens with start and end offsets of 0 and 12.
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>

     &spellcheck=true&spellcheck.collate=true

Is this an issue with my configuration/test or is there an issue with the
SpellingQueryConverter? Is there a recommended work around such as the
WhitespaceTokenizer as mention in the issue comments?

Thank you for your help.

package org.apache.solr.spelling;
import static org.junit.Assert.assertTrue;
import java.util.Collection;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.util.Version;
import org.apache.solr.common.util.NamedList;
import org.junit.Test;
public class SimpleQueryConverterTest {
 @Test
public void testSimpleQueryConversion() {
SpellingQueryConverter converter = new SpellingQueryConverter();
 converter.init(new NamedList());
converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35));
String original = "another-test";
 Collection<Token> tokens = converter.convert(original);
assertTrue("Token offsets do not match",
isOffsetCorrect(original, tokens));
 }
private boolean isOffsetCorrect(String s, Collection<Token> tokens) {
for (Token token : tokens) {
 int start = token.startOffset();
int end = token.endOffset();
if (!s.substring(start, end).equals(token.toString()))
 return false;
}
return true;
}
}

Is there an issue with hypens in SpellChecker with StandardTokenizer?

Reply via email to