It was fixed for the content field with 1016. Can you pinpoint the problematic 
field?
https://issues.apache.org/jira/browse/NUTCH-1016

 
 
-----Original message-----
> From:Stefan Scheffler <sscheff...@avantgarde-labs.de>
> Sent: Mon 24-Sep-2012 10:37
> To: user@nutch.apache.org
> Subject: Re: Indexing Exception
> 
> nutch 1.5, solr 3.6
> On 24.09.2012 10:34, Markus Jelsma wrote:
> > Hi - What version?
> >
> >   
> >   
> > -----Original message-----
> >> From:Stefan Scheffler <sscheff...@avantgarde-labs.de>
> >> Sent: Mon 24-Sep-2012 10:29
> >> To: user@nutch.apache.org
> >> Subject: Indexing Exception
> >>
> >> Hello,
> >> I have a strange Problem. While indexing a crawl to solr i got the
> >> following exception
> >>
> >> java.lang.RuntimeException: [was class java.io.CharConversionException]
> >> Invalid UTF-8 character 0xfffe at char #6886708, byte #11578429)
> >>       at
> >> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> >>       at
> >> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> >>       at
> >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> >>       at
> >> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> >>       at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:315)
> >>       at 
> >> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
> >>       at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
> >>       at
> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
> >>       at
> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
> >>       at
> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
> >>       at
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
> >>       at
> >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>       at
> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >>       at
> >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >>       at
> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >>       at
> >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >>       at
> >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> >>       at
> >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >>       at
> >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> >>       at
> >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >>       at org.mortbay.jetty.Server.handle(Server.java:326)
> >>       at
> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> >>       at
> >> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
> >>       at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:843)
> >>       at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
> >>       at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> >>       at
> >> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
> >>       at
> >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >> Caused by: java.io.CharConversionException: Invalid UTF-8 character
> >> 0xfffe at char #6886708, byte #11578429)
> >> ...
> >>
> >> It seems to be an encoding exception. Is there a way to avoid this?
> >>
> >> Regards
> >> Stefan
> >>
> >> -- 
> >> Stefan Scheffler
> >> Avantgarde Labs GmbH
> >> Löbauer Straße 19, 01099 Dresden
> >> Telefon: + 49 (0) 351 21590834
> >> Email: sscheff...@avantgarde-labs.de
> >>
> >>
> 
> 
> -- 
> Stefan Scheffler
> Avantgarde Labs GmbH
> Löbauer Straße 19, 01099 Dresden
> Telefon: + 49 (0) 351 21590834
> Email: sscheff...@avantgarde-labs.de
> 
> 

Reply via email to