[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Oestreicher updated SOLR-606: ------------------------------------ Attachment: handler.component.SpellCheckComponent-collate-patch.txt I recently ran into this exact issue and I found the problem. The collation is created by replacing the misspelled tokens with the suggestions using a StringBuilder: {noformat} for (Iterator<Map.Entry<Token, String>> bestIter = best.entrySet().iterator(); bestIter.hasNext();) { Map.Entry<Token, String> entry = bestIter.next(); Token tok = entry.getKey(); collation.replace(tok.startOffset(), tok.endOffset(), entry.getValue()); } {noformat} As you can see it's just replacing the relevant tokens in the original query. However, if the length of a suggestion doesn't equal the length of the original token, all offsets used after that replacement are no longer valid thus randomly yielding incorrect results. I fixed that by keeping track of that difference and adding it to the token offsets. For this to work I had to change the HashMap to a LinkedHashMap since this solution depends on the iteration order of the Tokens to correspond to their occurrence in the string. > spellcheck.colate doesn't handle multiple tokens properly > --------------------------------------------------------- > > Key: SOLR-606 > URL: https://issues.apache.org/jira/browse/SOLR-606 > Project: Solr > Issue Type: Bug > Components: spellchecker > Affects Versions: 1.3 > Environment: tomcat > Reporter: Geoffrey Young > Assignee: Grant Ingersoll > Priority: Minor > Attachments: handler.component.SpellCheckComponent-collate-patch.txt, > SOLR-606.patch > > > originally posted as part of SOLR-572: > > https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 > the new spellcheck.collate feature seems to exhibit some strange behaviors > when handed a query with multiple tokens. > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull",[ > "suggestion",["redbelly"]], > "show",[ > "suggestion",["shot"]], > "collation","redbelly airshotw"]}} > {noformat} > in this case, note the fields are incorrectly concatenated (no space between > tokens, left over 'w' from input string) > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show", > "spellcheck.q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull air show",[ > "suggestion",["redbull singers"]], > "collation","redbull singersredbull air show"]}} > {noformat} > this is slightly different - the suggestions are still concatenated without a > space, but the collation is way off. > --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.