[ 
https://issues.apache.org/jira/browse/SOLR-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856623#action_12856623
 ] 

Luke Forehand commented on SOLR-1883:
-------------------------------------

Removing the HTMLStripCharFilterFactory as an analyzer charFilter during 
indexing, the Exception goes away, which makes me suspect the issue is inside 
HTMLStripCharFilterFactory.

> Highlighting failure caused by InvalidTokenOffsetsException
> -----------------------------------------------------------
>
>                 Key: SOLR-1883
>                 URL: https://issues.apache.org/jira/browse/SOLR-1883
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 1.4
>         Environment: {code:title=java}
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
> {code}
> {code:title=solr lib manifest}
> Manifest-Version: 1.0
> Ant-Version: Apache Ant 1.7.0
> Created-By: 14.1-b02-90 (Apple Inc.)
> Extension-Name: org.apache.solr
> Specification-Title: Apache Solr Search Server
> Specification-Version: 1.4.0
> Specification-Vendor: The Apache Software Foundation
> Implementation-Title: org.apache.solr
> Implementation-Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:
>  33:40
> Implementation-Vendor: The Apache Software Foundation
> X-Compile-Source-JDK: 1.5
> X-Compile-Target-JDK: 1.5
> {code}
> {code:title=OS}
> Linux myhost 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 
> x86_64 GNU/Linux
> {code}
>            Reporter: Luke Forehand
>         Attachments: schema.xml, 
> test_doc_for_invalid_token_offsets_exception.xml
>
>
> This issue seems to be the same as a previous issue that was bulk closed in 
> solr 1.4 https://issues.apache.org/jira/browse/SOLR-1404, and I see someone 
> reported this bug in lucene 2.9.1 
> https://issues.apache.org/jira/browse/LUCENE-2208 We are experiencing this 
> issue as well.  
> I have pasted the important part of our schema.xml and the solr exception.  I 
> have also attached the document that fails when queried as a highlight query. 
>  The invalid token seems to be 'system' which is the very last token in the 
> document field if you look at the attached file.
> {code:title=schema.xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <schema name="xxx" version="1.1">
>       <types>
>               <fieldType name="scrubbedText" class="solr.TextField" 
> positionIncrementGap="100">
>                       <analyzer>
>                               <tokenizer 
> class="solr.StandardTokenizerFactory" />
>                               <charFilter 
> class="solr.HTMLStripCharFilterFactory" />
>                               <filter class="solr.StandardFilterFactory" />
>                               <filter class="solr.LowerCaseFilterFactory" />
>                               <filter class="solr.StopFilterFactory" />
>                       </analyzer>
>               </fieldType>
>               ...
>       </types>
>       <fields>
>               <field name="id" type="string" stored="true" indexed="true" />
>               <field name="textScrubbed" type="scrubbedText" stored="true" 
> indexed="true" />
>               ...
>       </fields>
>       <uniqueKey>id</uniqueKey>
>       <defaultSearchField>textScrubbed</defaultSearchField>
> </schema>
> {code}
> {code:title=solr.log exception}
> Apr 13, 2010 3:08:35 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: 
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token system 
> exceeds length of provided text sized 17063
>         at 
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342)
>         at 
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
>         at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>         at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>         at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>         at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>         at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>         at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>         at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>         at 
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:859)
>         at 
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:574)
>         at 
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1527)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: 
> Token system exceeds length of provided text sized 17063
>         at 
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
>         at 
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
>         ... 18 more
> {code}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to