[ https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-1655. ------------------------------------- Fix Version/s: ManifoldCF 2.18 Resolution: Fixed r1882582 > Web connector - UnsupportedEncodingException utf-8 > -------------------------------------------------- > > Key: CONNECTORS-1655 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1655 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector > Affects Versions: ManifoldCF 2.17 > Reporter: Julien Massiera > Assignee: Karl Wright > Priority: Critical > Fix For: ManifoldCF 2.18 > > > When crawling some sites (for instance this one: > [http://www.antibes-juanlespins.com/] ) the job manages to index some > documents, but the stops with the following error code: > Error: IO error: utf-8; filename=rseventspro_rss20_56.xml > Here is one the MCF stacktrace: > Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml > org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; > filename=rseventspro_rss20_56.xml > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203) > ~[?:?] > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855) > ~[?:?] > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746) > ~[?:?] > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > [mcf-pull-agent.jar:?] > Caused by: java.io.UnsupportedEncodingException: utf-8; > filename=rseventspro_rss20_56.xml > at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) > ~[?:1.8.0_212] > at java.io.InputStreamReader.<init>(InputStreamReader.java:100) ~[?:1.8.0_212] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47) > ~[?:?] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250) > ~[?:?] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52) > ~[?:?] > at > org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74) > ~[?:?] > at > org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174) > ~[?:?] > ... 3 more -- This message was sent by Atlassian Jira (v8.3.4#803005)