[ https://issues.apache.org/jira/browse/SPARK-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng reopened SPARK-3828: ---------------------------------- > Spark returns inconsistent results when building with different Hadoop > version > ------------------------------------------------------------------------------- > > Key: SPARK-3828 > URL: https://issues.apache.org/jira/browse/SPARK-3828 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.1.0 > Environment: OSX 10.9, Spark master branch > Reporter: Liquan Pei > > For text8 data at http://mattmahoney.net/dc/text8.zip. To reproduce, please > unzip first. > Spark build with different Hadoop version returns different result. > {code} > val data = sc.textFile("text8") > data.count() > {code} > returns 1 when built with SPARK_HADOOP_VERSION=1.0.4 and return 2 when built > with SPARK_HADOOP_VERSION=2.4.0. > Looking through the rdd code, it seems that textFile uses hadoopFile which > creates HadoopRDD, we should probably create newHadoopRDD when building spark > with SPARK_HADOOP_VERSION >= 2.0.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org