Hi,
  I installed Hadoop 2.6.0 today on one of the machines (172.26.49.156),
got HDFS running on it (both Namenode and Datanode on the same machine) and
copied the files to HDFS. However, from the same machine, when I try to
load the same CSV with the following statement:

  sqlContext.read.format("com.databricks.spark.csv").option("header",
"false").load("hdfs://
172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv")

 I get the error

java.net.ConnectException: Call From impetus-i0276.impetus.co.in/127.0.0.1
to impetus-i0276:54310 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused

  I have changed the port number to 8020 but the same error gets reported.

  Even the following command is not working from the command line, when
launched from the HADOOP_HOME folder for

  bin/hdfs dfs -ls hdfs://172.26.49.156:54310/

  which was working earlier when issued from the other machine
(172.26.49.55), from under HADOOP_HOME for Hadoop 1.0.4.

  I set ~/.bashrc are as follows, when I installed Hadoop 2.6.0:

  export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export YARN_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export SPARK_HOME=/home/impadmin/spark-1.6.0-bin-hadoop2.6
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

  Am I getting the port number wrong, or is it some other config param that
I should check? What's the general rule here?

Thanks
          Bibudh

On Tue, Apr 26, 2016 at 7:51 PM, Davies Liu <dav...@databricks.com> wrote:

> The Spark package you are using is packaged with Hadoop 2.6, but the
> HDFS is Hadoop 1.0.4, they are not compatible.
>
> On Tue, Apr 26, 2016 at 11:18 AM, Bibudh Lahiri <bibudhlah...@gmail.com>
> wrote:
> > Hi,
> >   I am trying to load a CSV file which is on HDFS. I have two machines:
> > IMPETUS-1466 (172.26.49.156) and IMPETUS-1325 (172.26.49.55). Both have
> > Spark 1.6.0 pre-built for Hadoop 2.6 and later, but for both, I had
> existing
> > Hadoop clusters running Hadoop 1.0.4. I have launched HDFS from
> > 172.26.49.156 by running start-dfs.sh from it, copied files from local
> file
> > system to HDFS and can view them by hadoop fs -ls.
> >
> >   However, when I am trying to load the CSV file from pyspark shell
> > (launched by bin/pyspark --packages com.databricks:spark-csv_2.10:1.3.0)
> > from IMPETUS-1325 (172.26.49.55) with the following commands:
> >
> >
> >>>from pyspark.sql import SQLContext
> >
> >>>sqlContext = SQLContext(sc)
> >
> >>>patients_df =
> >>> sqlContext.read.format("com.databricks.spark.csv").option("header",
> >>> "false").load("hdfs://
> 172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv
> ")
> >
> >
> > I get the following error:
> >
> >
> > java.io.EOFException: End of File Exception between local host is:
> > "IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55"; destination host is:
> > "IMPETUS-1466":54310; : java.io.EOFException; For more details see:
> > http://wiki.apache.org/hadoop/EOFException
> >
> >
> > U have changed the port number from 54310 to 8020, but then I get the
> error
> >
> >
> > java.net.ConnectException: Call From
> IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55
> > to IMPETUS-1466:8020 failed on connection exception:
> > java.net.ConnectException: Connection refused; For more details see:
> > http://wiki.apache.org/hadoop/ConnectionRefused
> >
> >
> > To me it seemed like this may result from a version mismatch between
> Spark
> > Hadoop client and my Hadoop cluster, so I have made the following
> changes:
> >
> >
> > 1) Added the following lines to conf/spark-env.sh
> >
> >
> > export HADOOP_HOME="/usr/local/hadoop-1.0.4" export
> > HADOOP_CONF_DIR="$HADOOP_HOME/conf" export
> > HDFS_URL="hdfs://172.26.49.156:8020"
> >
> >
> > 2) Downloaded Spark 1.6.0, pre-built with user-provided Hadoop, and in
> > addition to the three lines above, added the following line to
> > conf/spark-env.sh
> >
> >
> > export SPARK_DIST_CLASSPATH="/usr/local/hadoop-1.0.4/bin/hadoop"
> >
> >
> > but none of it seems to work. However, the following command works from
> > 172.26.49.55 and gives the directory listing:
> >
> > /usr/local/hadoop-1.0.4/bin/hadoop fs -ls hdfs://172.26.49.156:54310/
> >
> >
> > Any suggestion?
> >
> >
> > Thanks
> >
> > Bibudh
> >
> >
> > --
> > Bibudh Lahiri
> > Data Scientist, Impetus Technolgoies
> > 5300 Stevens Creek Blvd
> > San Jose, CA 95129
> > http://knowthynumbers.blogspot.com/
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Bibudh Lahiri
Senior Data Scientist, Impetus Technolgoies
720 University Avenue, Suite 130
Los Gatos, CA 95129
http://knowthynumbers.blogspot.com/

Reply via email to