Hi Steve, I was using branch 3.5. I will try this on tip of branch_3x too.
Thanks. On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe <sar...@syr.edu> wrote: > Hi Brandon, > > When I add the following to SpellingQueryConverterTest.java on the tip of > branch_3x (will be released as Solr 3.6), the test succeeds: > > @Test > public void testStandardAnalyzerWithHyphen() { > SpellingQueryConverter converter = new SpellingQueryConverter(); > converter.init(new NamedList()); > converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); > String original = "another-test"; > Collection<Token> tokens = converter.convert(original); > assertTrue("tokens is null and it shouldn't be", tokens != null); > assertEquals("tokens Size: " + tokens.size() + " is not 2", 2, > tokens.size()); > assertTrue("Token offsets do not match", isOffsetCorrect(original, > tokens)); > } > > What version of Solr/Lucene are you using? > > Steve > > > -----Original Message----- > > From: Brandon Fish [mailto:brandon.j.f...@gmail.com] > > Sent: Thursday, December 15, 2011 3:08 PM > > To: solr-user@lucene.apache.org > > Subject: Is there an issue with hypens in SpellChecker with > > StandardTokenizer? > > > > I am getting an error using the SpellChecker component with the query > > "another-test" > > java.lang.StringIndexOutOfBoundsException: String index out of range: -7 > > > > This appears to be related to this > > issue<https://issues.apache.org/jira/browse/SOLR-1630> which > > has been marked as fixed. My configuration and test case that follows > > appear to reproduce the error I am seeing. Both "another" and "test" get > > changed into tokens with start and end offsets of 0 and 12. > > <analyzer> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > > > &spellcheck=true&spellcheck.collate=true > > > > Is this an issue with my configuration/test or is there an issue with the > > SpellingQueryConverter? Is there a recommended work around such as the > > WhitespaceTokenizer as mention in the issue comments? > > > > Thank you for your help. > > > > package org.apache.solr.spelling; > > import static org.junit.Assert.assertTrue; > > import java.util.Collection; > > import org.apache.lucene.analysis.Token; > > import org.apache.lucene.analysis.standard.StandardAnalyzer; > > import org.apache.lucene.util.Version; > > import org.apache.solr.common.util.NamedList; > > import org.junit.Test; > > public class SimpleQueryConverterTest { > > @Test > > public void testSimpleQueryConversion() { > > SpellingQueryConverter converter = new SpellingQueryConverter(); > > converter.init(new NamedList()); > > converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35)); > > String original = "another-test"; > > Collection<Token> tokens = converter.convert(original); > > assertTrue("Token offsets do not match", > > isOffsetCorrect(original, tokens)); > > } > > private boolean isOffsetCorrect(String s, Collection<Token> tokens) { > > for (Token token : tokens) { > > int start = token.startOffset(); > > int end = token.endOffset(); > > if (!s.substring(start, end).equals(token.toString())) > > return false; > > } > > return true; > > } > > } >