Port 8020 is not the only port you need tunnelled for HDFS to work. If you
only list the contents of a directory, port 8020 is enough... for instance,
using something

val p = new org.apache.hadoop.fs.Path("hdfs://localhost:8020/")
val fs = p.getFileSystem(sc.hadoopConfiguration)
fs.listStatus(p)

you should see the file list.
But then, when accessing a file, you need to actually get its blocks, it
has to connect to the data node.
The error 'could not obtain block' means it can't get that block from the
DataNode.
Refer to
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_reference/content/reference_chap2_1.html
to see the complete list of ports that also need to be tunnelled.



2015-08-24 13:10 GMT-07:00 Dino Fancellu <d...@felstar.com>:

> Changing the ip to the guest IP address just never connects.
>
> The VM has port tunnelling, and it passes through all the main ports,
> 8020 included to the host VM.
>
> You can tell that it was talking to the guest VM before, simply
> because it said when file not found
>
> Error is:
>
> Exception in thread "main" org.apache.spark.SparkException: Job
> aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-452094660-10.0.2.15-1437494483194:blk_1073742905_2098
> file=/tmp/people.txt
>
> but I have no idea what it means by that. It certainly can find the
> file and knows it exists.
>
>
>
> On 24 August 2015 at 20:43, Roberto Congiu <roberto.con...@gmail.com>
> wrote:
> > When you launch your HDP guest VM, most likely it gets launched with NAT
> and
> > an address on a private network (192.168.x.x) so on your windows host you
> > should use that address (you can find out using ifconfig on the guest
> OS).
> > I usually add an entry to my /etc/hosts for VMs that I use often....if
> you
> > use vagrant, there's also a vagrant module that can do that
> automatically.
> > Also, I am not sure how the default HDP VM is set up, that is, if it only
> > binds HDFS to 127.0.0.1 or to all addresses. You can check that with
> netstat
> > -a.
> >
> > R.
> >
> > 2015-08-24 11:46 GMT-07:00 Dino Fancellu <d...@felstar.com>:
> >>
> >> I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM.
> >>
> >> If I go into the guest spark-shell and refer to the file thus, it works
> >> fine
> >>
> >>   val words=sc.textFile("hdfs:///tmp/people.txt")
> >>   words.count
> >>
> >> However if I try to access it from a local Spark app on my Windows host,
> >> it
> >> doesn't work
> >>
> >>   val conf = new SparkConf().setMaster("local").setAppName("My App")
> >>   val sc = new SparkContext(conf)
> >>
> >>   val words=sc.textFile("hdfs://localhost:8020/tmp/people.txt")
> >>   words.count
> >>
> >> Emits
> >>
> >>
> >>
> >> The port 8020 is open, and if I choose the wrong file name, it will tell
> >> me
> >>
> >>
> >>
> >> My pom has
> >>
> >>         <dependency>
> >>                         <groupId>org.apache.spark</groupId>
> >>                         <artifactId>spark-core_2.11</artifactId>
> >>                         <version>1.4.1</version>
> >>                         <scope>provided</scope>
> >>                 </dependency>
> >>
> >> Am I doing something wrong?
> >>
> >> Thanks.
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>

Reply via email to