[ https://issues.apache.org/jira/browse/LUCENE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand resolved LUCENE-4730. ---------------------------------- Resolution: Fixed Thanks Michael for digging it. > SmartChineseAnalyzer got wrong matched offset > --------------------------------------------- > > Key: LUCENE-4730 > URL: https://issues.apache.org/jira/browse/LUCENE-4730 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Affects Versions: 4.0, 4.1 > Environment: JDK1.7 Linux/Windows > Reporter: Jinsong Hu > Priority: Critical > Attachments: LUCENE-4730.patch > > > We found that SmartChineseAnalyzer got wrong matched offset with the > following test code: > public void testHighlight() throws Exception { > String text = "My China "; > String queryText = "China"; > StringBuilder builder = new StringBuilder("<html>"); > Analyzer analyzer = new SmartChineseAnalyzer(Version.LUCENE_40); > //Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); > QueryParser parser = new QueryParser(Version.LUCENE_40, "text", > analyzer); > Query query = parser.parse(queryText); > SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span > style=\"background: yellow\">", "</span>"); > TokenStream tokens = analyzer.tokenStream("text", new > StringReader(text)); > QueryScorer scorer = new QueryScorer(query, "text"); > Highlighter highlighter = new Highlighter(formatter, scorer); > highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer)); > String result = highlighter.getBestFragments(tokens, text, 10, "..."); > if (result.length() < text.length()) { > result = text; > } > builder.append("<body>"); > builder.append(result); > builder.append("</body>"); > builder.append("</html>"); > System.out.println(builder.toString()); > } > This method will generate a hilighted text, however, the highlight position > is obviously wrong, and if we remove one space from the text, that is, change > text from "My China " (ends with two spaces) to "My China " (ends with one > space), it will generate a text with correct highlight. If we change the > analyzer from SmartChineseAnalyzer to StandardAnalyzer, the highlight issue > will disappear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org