That's what I'd suggest too. Furthermore, if you use vagrant to spin up VMs, there's a module that can do that automatically for you.
R. 2015-08-25 10:11 GMT-07:00 Steve Loughran <ste...@hortonworks.com>: > I wouldn't try to play with forwarding & tunnelling; always hard to work > out what ports get used everywhere, and the services like hostname==URL in > paths. > > Can't you just set up an entry in the windows /etc/hosts file? It's what I > do (on Unix) to talk to VMs > > > > On 25 Aug 2015, at 04:49, Dino Fancellu <d...@felstar.com> wrote: > > > > Tried adding 50010, 50020 and 50090. Still no difference. > > > > I can't imagine I'm the only person on the planet wanting to do this. > > > > Anyway, thanks for trying to help. > > > > Dino. > > > > On 25 August 2015 at 08:22, Roberto Congiu <roberto.con...@gmail.com> > wrote: > >> Port 8020 is not the only port you need tunnelled for HDFS to work. If > you > >> only list the contents of a directory, port 8020 is enough... for > instance, > >> using something > >> > >> val p = new org.apache.hadoop.fs.Path("hdfs://localhost:8020/") > >> val fs = p.getFileSystem(sc.hadoopConfiguration) > >> fs.listStatus(p) > >> > >> you should see the file list. > >> But then, when accessing a file, you need to actually get its blocks, > it has > >> to connect to the data node. > >> The error 'could not obtain block' means it can't get that block from > the > >> DataNode. > >> Refer to > >> > http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_reference/content/reference_chap2_1.html > >> to see the complete list of ports that also need to be tunnelled. > >> > >> > >> > >> 2015-08-24 13:10 GMT-07:00 Dino Fancellu <d...@felstar.com>: > >>> > >>> Changing the ip to the guest IP address just never connects. > >>> > >>> The VM has port tunnelling, and it passes through all the main ports, > >>> 8020 included to the host VM. > >>> > >>> You can tell that it was talking to the guest VM before, simply > >>> because it said when file not found > >>> > >>> Error is: > >>> > >>> Exception in thread "main" org.apache.spark.SparkException: Job > >>> aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most > >>> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): > >>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: > >>> BP-452094660-10.0.2.15-1437494483194:blk_1073742905_2098 > >>> file=/tmp/people.txt > >>> > >>> but I have no idea what it means by that. It certainly can find the > >>> file and knows it exists. > >>> > >>> > >>> > >>> On 24 August 2015 at 20:43, Roberto Congiu <roberto.con...@gmail.com> > >>> wrote: > >>>> When you launch your HDP guest VM, most likely it gets launched with > NAT > >>>> and > >>>> an address on a private network (192.168.x.x) so on your windows host > >>>> you > >>>> should use that address (you can find out using ifconfig on the guest > >>>> OS). > >>>> I usually add an entry to my /etc/hosts for VMs that I use often....if > >>>> you > >>>> use vagrant, there's also a vagrant module that can do that > >>>> automatically. > >>>> Also, I am not sure how the default HDP VM is set up, that is, if it > >>>> only > >>>> binds HDFS to 127.0.0.1 or to all addresses. You can check that with > >>>> netstat > >>>> -a. > >>>> > >>>> R. > >>>> > >>>> 2015-08-24 11:46 GMT-07:00 Dino Fancellu <d...@felstar.com>: > >>>>> > >>>>> I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM. > >>>>> > >>>>> If I go into the guest spark-shell and refer to the file thus, it > works > >>>>> fine > >>>>> > >>>>> val words=sc.textFile("hdfs:///tmp/people.txt") > >>>>> words.count > >>>>> > >>>>> However if I try to access it from a local Spark app on my Windows > >>>>> host, > >>>>> it > >>>>> doesn't work > >>>>> > >>>>> val conf = new SparkConf().setMaster("local").setAppName("My App") > >>>>> val sc = new SparkContext(conf) > >>>>> > >>>>> val words=sc.textFile("hdfs://localhost:8020/tmp/people.txt") > >>>>> words.count > >>>>> > >>>>> Emits > >>>>> > >>>>> > >>>>> > >>>>> The port 8020 is open, and if I choose the wrong file name, it will > >>>>> tell > >>>>> me > >>>>> > >>>>> > >>>>> > >>>>> My pom has > >>>>> > >>>>> <dependency> > >>>>> <groupId>org.apache.spark</groupId> > >>>>> <artifactId>spark-core_2.11</artifactId> > >>>>> <version>1.4.1</version> > >>>>> <scope>provided</scope> > >>>>> </dependency> > >>>>> > >>>>> Am I doing something wrong? > >>>>> > >>>>> Thanks. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> View this message in context: > >>>>> > >>>>> > http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html > >>>>> Sent from the Apache Spark User List mailing list archive at > >>>>> Nabble.com. > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>>>> For additional commands, e-mail: user-h...@spark.apache.org > >>>>> > >>>> > >> > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > >