Spark will need to connect both to the hive metastore and to all HDFS
nodes (NN and DN's). If that is all in place then it should work. In
this case it looks like maybe it can't connect to a datanode in HDFS
to get the raw data. Keep in mind that the performance might not be
very good if you are trying to read large amounts of data over the
network.

On Wed, Oct 8, 2014 at 5:33 AM, jamborta <jambo...@gmail.com> wrote:
> Hi all,
>
> just wondering if is it possible to allow spark to connect to hive on
> another cluster located remotely?
>
> I have setup hive-site.xml and amended the hive-metatstore uri, also opened
> the port for zookeeper, webhdfs and hive metastore.
>
> It seems it connects to hive, then it fails with the following:
>
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-1886934195-100.73.212.101-1411645855947:blk_1073763904_23146
> file=/user/tja01/datasets/00ab46fa4d6711e4afb70003ff41ebbf/part-00003
>
> not sure if some of the ports are not open or it needs access to additional
> things.
>
> thanks,
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/sparksql-connect-remote-hive-cluster-tp15928.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to