[ https://issues.apache.org/jira/browse/NUTCH-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel reassigned NUTCH-2696: -------------------------------------- Assignee: Sebastian Nagel > Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x > ---------------------------------------------------------------------- > > Key: NUTCH-2696 > URL: https://issues.apache.org/jira/browse/NUTCH-2696 > Project: Nutch > Issue Type: Bug > Components: segment > Affects Versions: 1.15 > Environment: Hadoop version : 3.0.0 (CDH 6.1) > Nutch : 1.15 > Mode : distributed mode > Reporter: Laurent Hervaud > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.16 > > > All Nutch tasks work properly with Hadoop 3.x. (except SegmentReader) > SegmentReader with -get option work fine. > SegmentReader with -dump option replace non-ascii character by ? > Exemple url : [http://www.wikipedia.fr/index.php] > > {code:java} > command : ./runtime/deploy/bin/nutch readseg -dump > /user/nutch/crawl1.15/segments/20190221093756 /tmp/dump1.15 -nocontent > -nogenerate -noparse -noparsedata > ParseText:: > Wikipedia.fr - Portail de recherche sur les projets Wikim?dia > Chercher sur Wikip?dia en fran?ais > L?encyclop?die librement r?utilisable que chacun peut am?liorer. > {code} > > > {code:java} > command : ./runtime/deploy/bin/nutch readseg -get > /user/nutch/crawl1.15/segments/20190221093756 > http://www.wikipedia.fr/index.php -nocontent -nogenerate -noparse -noparsedata > ParseText:: > Wikipedia.fr - Portail de recherche sur les projets Wikimédia > Chercher sur Wikipédia en français > L’encyclopédie librement réutilisable que chacun peut améliorer. > {code} > > I try to build with hadoop 3.0.0 dependencies in ivy.xml but i have the same > result > It's work fine in local mode. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)