Highlighting failure caused by InvalidTokenOffsetsException -----------------------------------------------------------
Key: SOLR-1883 URL: https://issues.apache.org/jira/browse/SOLR-1883 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.4 Environment: {code:title=java} Java(TM) SE Runtime Environment (build 1.6.0_18-b07) Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode) {code} {code:title=solr lib manifest} Manifest-Version: 1.0 Ant-Version: Apache Ant 1.7.0 Created-By: 14.1-b02-90 (Apple Inc.) Extension-Name: org.apache.solr Specification-Title: Apache Solr Search Server Specification-Version: 1.4.0 Specification-Vendor: The Apache Software Foundation Implementation-Title: org.apache.solr Implementation-Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12: 33:40 Implementation-Vendor: The Apache Software Foundation X-Compile-Source-JDK: 1.5 X-Compile-Target-JDK: 1.5 {code} {code:title=OS} Linux myhost 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux {code} Reporter: Luke Forehand This issue seems to be the same as a previous issue that was bulk closed in solr 1.4 https://issues.apache.org/jira/browse/SOLR-1404, and I see someone reported this bug in lucene 2.9.1 https://issues.apache.org/jira/browse/LUCENE-2208 We are experiencing this issue as well. I have pasted the important part of our schema.xml and the solr exception. I have also attached the document that fails when queried as a highlight query. The invalid token seems to be 'system' which is the very last token in the document field if you look at the attached file. {code:title=schema.xml} <?xml version="1.0" encoding="UTF-8"?> <schema name="xxx" version="1.1"> <types> <fieldType name="scrubbedText" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory" /> <charFilter class="solr.HTMLStripCharFilterFactory" /> <filter class="solr.StandardFilterFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopFilterFactory" /> </analyzer> </fieldType> ... </types> <fields> <field name="id" type="string" stored="true" indexed="true" /> <field name="textScrubbed" type="scrubbedText" stored="true" indexed="true" /> ... </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>textScrubbed</defaultSearchField> </schema> {code} {code:title=solr.log exception} Apr 13, 2010 3:08:35 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token system exceeds length of provided text sized 17063 at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token system exceeds length of provided text sized 17063 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) ... 18 more {code} -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira