It was fixed for the content field with 1016. Can you pinpoint the problematic field? https://issues.apache.org/jira/browse/NUTCH-1016
-----Original message----- > From:Stefan Scheffler <sscheff...@avantgarde-labs.de> > Sent: Mon 24-Sep-2012 10:37 > To: user@nutch.apache.org > Subject: Re: Indexing Exception > > nutch 1.5, solr 3.6 > On 24.09.2012 10:34, Markus Jelsma wrote: > > Hi - What version? > > > > > > > > -----Original message----- > >> From:Stefan Scheffler <sscheff...@avantgarde-labs.de> > >> Sent: Mon 24-Sep-2012 10:29 > >> To: user@nutch.apache.org > >> Subject: Indexing Exception > >> > >> Hello, > >> I have a strange Problem. While indexing a crawl to solr i got the > >> following exception > >> > >> java.lang.RuntimeException: [was class java.io.CharConversionException] > >> Invalid UTF-8 character 0xfffe at char #6886708, byte #11578429) > >> at > >> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > >> at > >> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > >> at > >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > >> at > >> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > >> at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:315) > >> at > >> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156) > >> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) > >> at > >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) > >> at > >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) > >> at > >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > >> at > >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > >> at > >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > >> at > >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > >> at > >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > >> at > >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > >> at > >> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > >> at > >> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > >> at > >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > >> at org.mortbay.jetty.Server.handle(Server.java:326) > >> at > >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > >> at > >> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) > >> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:843) > >> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) > >> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > >> at > >> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) > >> at > >> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > >> Caused by: java.io.CharConversionException: Invalid UTF-8 character > >> 0xfffe at char #6886708, byte #11578429) > >> ... > >> > >> It seems to be an encoding exception. Is there a way to avoid this? > >> > >> Regards > >> Stefan > >> > >> -- > >> Stefan Scheffler > >> Avantgarde Labs GmbH > >> Löbauer Straße 19, 01099 Dresden > >> Telefon: + 49 (0) 351 21590834 > >> Email: sscheff...@avantgarde-labs.de > >> > >> > > > -- > Stefan Scheffler > Avantgarde Labs GmbH > Löbauer Straße 19, 01099 Dresden > Telefon: + 49 (0) 351 21590834 > Email: sscheff...@avantgarde-labs.de > >