Highlighting failure caused by InvalidTokenOffsetsException
-----------------------------------------------------------

                 Key: SOLR-1883
                 URL: https://issues.apache.org/jira/browse/SOLR-1883
             Project: Solr
          Issue Type: Bug
          Components: highlighter
    Affects Versions: 1.4
         Environment: {code:title=java}
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
{code}
{code:title=solr lib manifest}
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.0
Created-By: 14.1-b02-90 (Apple Inc.)
Extension-Name: org.apache.solr
Specification-Title: Apache Solr Search Server
Specification-Version: 1.4.0
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.solr
Implementation-Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:
 33:40
Implementation-Vendor: The Apache Software Foundation
X-Compile-Source-JDK: 1.5
X-Compile-Target-JDK: 1.5
{code}
{code:title=OS}
Linux myhost 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 
x86_64 GNU/Linux
{code}
            Reporter: Luke Forehand



This issue seems to be the same as a previous issue that was bulk closed in 
solr 1.4 https://issues.apache.org/jira/browse/SOLR-1404, and I see someone 
reported this bug in lucene 2.9.1 
https://issues.apache.org/jira/browse/LUCENE-2208 We are experiencing this 
issue as well.  

I have pasted the important part of our schema.xml and the solr exception.  I 
have also attached the document that fails when queried as a highlight query.  
The invalid token seems to be 'system' which is the very last token in the 
document field if you look at the attached file.

{code:title=schema.xml}
<?xml version="1.0" encoding="UTF-8"?>

<schema name="xxx" version="1.1">

        <types>

                <fieldType name="scrubbedText" class="solr.TextField" 
positionIncrementGap="100">
                        <analyzer>
                                <tokenizer 
class="solr.StandardTokenizerFactory" />
                                <charFilter 
class="solr.HTMLStripCharFilterFactory" />
                                <filter class="solr.StandardFilterFactory" />
                                <filter class="solr.LowerCaseFilterFactory" />
                                <filter class="solr.StopFilterFactory" />
                        </analyzer>
                </fieldType>
                ...
        </types>

        <fields>
                <field name="id" type="string" stored="true" indexed="true" />
                <field name="textScrubbed" type="scrubbedText" stored="true" 
indexed="true" />
                ...
        </fields>

        <uniqueKey>id</uniqueKey>
        <defaultSearchField>textScrubbed</defaultSearchField>

</schema>
{code}

{code:title=solr.log exception}
Apr 13, 2010 3:08:35 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token system 
exceeds length of provided text sized 17063
        at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342)
        at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
        at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at 
org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
        at 
org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574)
        at 
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: 
Token system exceeds length of provided text sized 17063
        at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
        at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
        ... 18 more
{code}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to