Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?

Brandon Fish Thu, 15 Dec 2011 14:18:00 -0800

Hi Steve,

I was using branch 3.5. I will try this on tip of branch_3x too.


Thanks.

On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe <sar...@syr.edu> wrote:

> Hi Brandon,
>
> When I add the following to SpellingQueryConverterTest.java on the tip of
> branch_3x (will be released as Solr 3.6), the test succeeds:
>
> @Test
> public void testStandardAnalyzerWithHyphen() {
>   SpellingQueryConverter converter = new SpellingQueryConverter();
>  converter.init(new NamedList());
>  converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35));
>  String original = "another-test";
>  Collection<Token> tokens = converter.convert(original);
>   assertTrue("tokens is null and it shouldn't be", tokens != null);
>  assertEquals("tokens Size: " + tokens.size() + " is not 2", 2,
> tokens.size());
>   assertTrue("Token offsets do not match", isOffsetCorrect(original,
> tokens));
> }
>
> What version of Solr/Lucene are you using?
>
> Steve
>
> > -----Original Message-----
> > From: Brandon Fish [mailto:brandon.j.f...@gmail.com]
> > Sent: Thursday, December 15, 2011 3:08 PM
> > To: solr-user@lucene.apache.org
> > Subject: Is there an issue with hypens in SpellChecker with
> > StandardTokenizer?
> >
> > I am getting an error using the SpellChecker component with the query
> > "another-test"
> > java.lang.StringIndexOutOfBoundsException: String index out of range: -7
> >
> > This appears to be related to this
> > issue<https://issues.apache.org/jira/browse/SOLR-1630> which
> > has been marked as fixed. My configuration and test case that follows
> > appear to reproduce the error I am seeing. Both "another" and "test" get
> > changed into tokens with start and end offsets of 0 and 12.
> >       <analyzer>
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >     <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >
> >      &spellcheck=true&spellcheck.collate=true
> >
> > Is this an issue with my configuration/test or is there an issue with the
> > SpellingQueryConverter? Is there a recommended work around such as the
> > WhitespaceTokenizer as mention in the issue comments?
> >
> > Thank you for your help.
> >
> > package org.apache.solr.spelling;
> > import static org.junit.Assert.assertTrue;
> > import java.util.Collection;
> > import org.apache.lucene.analysis.Token;
> > import org.apache.lucene.analysis.standard.StandardAnalyzer;
> > import org.apache.lucene.util.Version;
> > import org.apache.solr.common.util.NamedList;
> > import org.junit.Test;
> > public class SimpleQueryConverterTest {
> >  @Test
> > public void testSimpleQueryConversion() {
> > SpellingQueryConverter converter = new SpellingQueryConverter();
> >  converter.init(new NamedList());
> > converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35));
> > String original = "another-test";
> >  Collection<Token> tokens = converter.convert(original);
> > assertTrue("Token offsets do not match",
> > isOffsetCorrect(original, tokens));
> >  }
> > private boolean isOffsetCorrect(String s, Collection<Token> tokens) {
> > for (Token token : tokens) {
> >  int start = token.startOffset();
> > int end = token.endOffset();
> > if (!s.substring(start, end).equals(token.toString()))
> >  return false;
> > }
> > return true;
> > }
> > }
>

Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?

Reply via email to