Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?

Brandon Fish Thu, 15 Dec 2011 15:16:29 -0800

Yes the branch_3x works for me as well. The addition of the OffsetAttribute
probably corrected this issue.  I will either switch to WhitespaceAnalyzer,
patch my distribution or wait for 3.6 to resolve this.


Thanks.

On Thu, Dec 15, 2011 at 4:17 PM, Brandon Fish <brandon.j.f...@gmail.com>wrote:

> Hi Steve,
>
> I was using branch 3.5. I will try this on tip of branch_3x too.
>
> Thanks.
>
>
> On Thu, Dec 15, 2011 at 4:14 PM, Steven A Rowe <sar...@syr.edu> wrote:
>
>> Hi Brandon,
>>
>> When I add the following to SpellingQueryConverterTest.java on the tip of
>> branch_3x (will be released as Solr 3.6), the test succeeds:
>>
>> @Test
>> public void testStandardAnalyzerWithHyphen() {
>>   SpellingQueryConverter converter = new SpellingQueryConverter();
>>  converter.init(new NamedList());
>>  converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35));
>>  String original = "another-test";
>>  Collection<Token> tokens = converter.convert(original);
>>   assertTrue("tokens is null and it shouldn't be", tokens != null);
>>  assertEquals("tokens Size: " + tokens.size() + " is not 2", 2,
>> tokens.size());
>>   assertTrue("Token offsets do not match", isOffsetCorrect(original,
>> tokens));
>> }
>>
>> What version of Solr/Lucene are you using?
>>
>> Steve
>>
>> > -----Original Message-----
>> > From: Brandon Fish [mailto:brandon.j.f...@gmail.com]
>> > Sent: Thursday, December 15, 2011 3:08 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Is there an issue with hypens in SpellChecker with
>> > StandardTokenizer?
>> >
>> > I am getting an error using the SpellChecker component with the query
>> > "another-test"
>> > java.lang.StringIndexOutOfBoundsException: String index out of range: -7
>> >
>> > This appears to be related to this
>> > issue<https://issues.apache.org/jira/browse/SOLR-1630> which
>> > has been marked as fixed. My configuration and test case that follows
>> > appear to reproduce the error I am seeing. Both "another" and "test" get
>> > changed into tokens with start and end offsets of 0 and 12.
>> >       <analyzer>
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >     <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt"/>
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >       </analyzer>
>> >
>> >      &spellcheck=true&spellcheck.collate=true
>> >
>> > Is this an issue with my configuration/test or is there an issue with
>> the
>> > SpellingQueryConverter? Is there a recommended work around such as the
>> > WhitespaceTokenizer as mention in the issue comments?
>> >
>> > Thank you for your help.
>> >
>> > package org.apache.solr.spelling;
>> > import static org.junit.Assert.assertTrue;
>> > import java.util.Collection;
>> > import org.apache.lucene.analysis.Token;
>> > import org.apache.lucene.analysis.standard.StandardAnalyzer;
>> > import org.apache.lucene.util.Version;
>> > import org.apache.solr.common.util.NamedList;
>> > import org.junit.Test;
>> > public class SimpleQueryConverterTest {
>> >  @Test
>> > public void testSimpleQueryConversion() {
>> > SpellingQueryConverter converter = new SpellingQueryConverter();
>> >  converter.init(new NamedList());
>> > converter.setAnalyzer(new StandardAnalyzer(Version.LUCENE_35));
>> > String original = "another-test";
>> >  Collection<Token> tokens = converter.convert(original);
>> > assertTrue("Token offsets do not match",
>> > isOffsetCorrect(original, tokens));
>> >  }
>> > private boolean isOffsetCorrect(String s, Collection<Token> tokens) {
>> > for (Token token : tokens) {
>> >  int start = token.startOffset();
>> > int end = token.endOffset();
>> > if (!s.substring(start, end).equals(token.toString()))
>> >  return false;
>> > }
>> > return true;
>> > }
>> > }
>>
>
>

Re: Is there an issue with hypens in SpellChecker with StandardTokenizer?

Reply via email to