Thanks for the reference. Is the following procedure correct?
1. Copy of the Hadoop source code
org.apache.hadoop.mapreduce.lib.input .TextInputFormat.java as my own class,
e.g. UncTextInputFormat.java
2. Modify UncTextInputFormat.java to handle UNC path
3. Call sc.newAPIHadoopFile(…) with
sc.newAPIHadoopFile[LongWritable, Text,
UncTextInputFormat](“file:////10.196.119.230/folder1/abc.txt”,
classOf[UncTextInputFormat],
classOf[LongWritable],
classOf[Text], conf)
Ningjun
From: Akhil Das [mailto:[email protected]]
Sent: Wednesday, March 11, 2015 2:40 AM
To: Wang, Ningjun (LNG-NPV)
Cc: java8964; [email protected]
Subject: Re: sc.textFile() on windows cannot access UNC path
I don't have a complete example for your usecase, but you can see a lot of
codes showing how to use new APIHadoopFile from
here<https://github.com/search?q=sc.newAPIHadoopFile&type=Code&utf8=%E2%9C%93>
Thanks
Best Regards
On Tue, Mar 10, 2015 at 7:37 PM, Wang, Ningjun (LNG-NPV)
<[email protected]<mailto:[email protected]>> wrote:
This sounds like the right approach. Is there any sample code showing how to
use sc.newAPIHadoopFile ? I am new to Spark and don’t know much about Hadoop.
I just want to read a text file from UNC path into an RDD.
Thanks
From: Akhil Das
[mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, March 10, 2015 9:14 AM
To: java8964
Cc: Wang, Ningjun (LNG-NPV); [email protected]<mailto:[email protected]>
Subject: Re: sc.textFile() on windows cannot access UNC path
You can create your own Input Reader (using java.nio.*) and pass it to the
sc.newAPIHadoopFile while reading.
Thanks
Best Regards
On Tue, Mar 10, 2015 at 6:28 PM, java8964
<[email protected]<mailto:[email protected]>> wrote:
I think the work around is clear.
Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path.
Yong
________________________________
From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>;
[email protected]<mailto:[email protected]>
Subject: RE: sc.textFile() on windows cannot access UNC path
Date: Tue, 10 Mar 2015 03:02:37 +0000
Hi Yong
Thanks for the reply. Yes it works with local drive letter. But I really need
to use UNC path because the path is input from at runtime. I cannot dynamically
assign a drive letter to arbitrary UNC path at runtime.
Is there any work around that I can use UNC path for sc.textFile(…)?
Ningjun
From: java8964 [mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, March 09, 2015 5:33 PM
To: Wang, Ningjun (LNG-NPV); [email protected]<mailto:[email protected]>
Subject: RE: sc.textFile() on windows cannot access UNC path
This is a Java problem, not really Spark.
From this page:
http://stackoverflow.com/questions/18520972/converting-java-file-url-to-file-path-platform-independent-including-u
You can see that using Java.nio.* on JDK 7, it will fix this issue. But Path
class in Hadoop will use java.io.*, instead of java.nio.
You need to manually mount your windows remote share a local driver, like "Z:",
then it should work.
Yong
________________________________
From: [email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Subject: sc.textFile() on windows cannot access UNC path
Date: Mon, 9 Mar 2015 21:09:38 +0000
I am running Spark on windows 2008 R2. I use sc.textFile() to load text file
using UNC path, it does not work.
sc.textFile(raw"file:////10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()
Input path does not exist:
file:/10.196.119.230/folder1/abc.txt<http://10.196.119.230/folder1/abc.txt>
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/10.196.119.230/tar/Enron/enron-207-short.load<http://10.196.119.230/tar/Enron/enron-207-short.load>
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.count(RDD.scala:910)
at
ltn.analytics.tests.IndexTest$$anonfun$3.apply$mcV$sp(IndexTest.scala:104)
at
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at
ltn.analytics.tests.IndexTest$$anonfun$3.apply(IndexTest.scala:103)
at
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
at
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at
org.scalatest.SuperEngine.org<http://org.scalatest.SuperEngine.org>$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at
org.scalatest.FunSuite.org<http://org.scalatest.FunSuite.org>$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
at
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
at
ltn.analytics.tests.IndexTest.org<http://ltn.analytics.tests.IndexTest.org>$scalatest$BeforeAndAfterAll$$super$run(IndexTest.scala:15)
at
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
at
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
at ltn.analytics.tests.IndexTest.run(IndexTest.scala:15)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
at
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
at
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
at
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
at
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
at org.scalatest.tools.Runner$.run(Runner.scala:883)
at org.scalatest.tools.Runner.run(Runner.scala)
at
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:137)
at
org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
The path is correct, I can open windows explorer enter the following path to
open the text file
\\10.196.119.230\folder1\abc.txt<file:///\\10.196.119.230\folder1\abc.txt>
I have tried to use 3 slah, 2 slah, and always got the same error
sc.textFile(raw"file:///10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()
sc.textFile(raw"file://10.196.119.230/folder1/abc.txt<file:///\\10.196.119.230\folder1\abc.txt>",
4).count()
Please advise.
Ningjun