What happens when you do: sc.textFile("hdfs://path/to/the_file.txt")
Thanks Best Regards On Mon, Mar 30, 2015 at 11:04 AM, Nick Travers <n.e.trav...@gmail.com> wrote: > Hi List, > > I'm following this example here > < > https://github.com/databricks/learning-spark/tree/master/mini-complete-example > > > with the following: > > $SPARK_HOME/bin/spark-submit \ > --deploy-mode cluster \ > --master spark://host.domain.ex:7077 \ > --class com.oreilly.learningsparkexamples.mini.scala.WordCount \ > > hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar > \ > hdfs://host.domain.ex/user/nickt/linkage > hdfs://host.domain.ex/user/nickt/wordcounts > > The jar is submitted fine and I can see it appear on the driver node (i.e. > connecting to and reading from HDFS ok): > > -rw-r--r-- 1 nickt nickt 15K Mar 29 22:05 > learning-spark-mini-example_2.10-0.0.1.jar > -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr > -rw-r--r-- 1 nickt nickt 0 Mar 29 22:05 stdout > > But it's failing due to a java.io.FileNotFoundException saying my input > file > is missing: > > Caused by: java.io.FileNotFoundException: Added file > > file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage > does not exist. > > I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the > workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to > the file on each of the hosts. > > Has anyone come up against this before when reading from HDFS? No doubt I'm > doing something wrong. > > Full trace below: > > Launch Command: "/usr/java/java8/bin/java" "-cp" > > ":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar" > "-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false" > "-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount" > "-Dspark.akka.askTimeout=10" > > "-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar" > "-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M" > "org.apache.spark.deploy.worker.DriverWrapper" > "akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker" > > "/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar" > "com.oreilly.learningsparkexamples.mini.scala.WordCount" > "hdfs://host.domain.ex/user/nickt/linkage" > "hdfs://host.domain.ex/user/nickt/wordcounts" > ======================================== > > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for > more info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt > 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt > 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(nickt); users > with modify permissions: Set(nickt) > 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started > 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port > 44201. > 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker > akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker > 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0 > 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt > 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt > 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(nickt); users > with modify permissions: Set(nickt) > 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started > 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on > port 33382. > 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker > 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster > 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at > > /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666 > 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to > akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker > 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1 > MB > 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is > > /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66 > 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server > 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT > 15/03/29 22:05:05 INFO AbstractConnector: Started > SocketConnector@0.0.0.0:42484 > 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file > server' on port 42484. > 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator > 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT > 15/03/29 22:05:06 INFO AbstractConnector: Started > SelectChannelConnector@0.0.0.0:4040 > 15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on > port > 4040. > 15/03/29 22:05:06 INFO SparkUI: Started SparkUI at > http://host5.domain.ex:4040 > 15/03/29 22:05:06 ERROR SparkContext: Jar not found at > target/scala-2.10/learning-spark-mini-example_2.10-0.0.1.jar > 15/03/29 22:05:06 INFO AppClient$ClientActor: Connecting to master > akka.tcp://sparkmas...@host.domain.ex:7077/user/Master... > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20150329220506-0027 > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: > app-20150329220506-0027/0 on worker-20150329112422-host3.domain.ex-33765 > (host3.domain.ex:33765) with 64 cores > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20150329220506-0027/0 on hostPort host3.domain.ex:33765 with 64 cores, > 512.0 MB RAM > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: > app-20150329220506-0027/1 on worker-20150329112422-host6.domain.ex-35464 > (host6.domain.ex:35464) with 64 cores > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20150329220506-0027/1 on hostPort host6.domain.ex:35464 with 64 cores, > 512.0 MB RAM > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: > app-20150329220506-0027/2 on worker-20150329112422-host2.domain.ex-40914 > (host2.domain.ex:40914) with 64 cores > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20150329220506-0027/2 on hostPort host2.domain.ex:40914 with 64 cores, > 512.0 MB RAM > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: > app-20150329220506-0027/3 on worker-20150329112421-host4.domain.ex-35927 > (host4.domain.ex:35927) with 64 cores > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20150329220506-0027/3 on hostPort host4.domain.ex:35927 with 64 cores, > 512.0 MB RAM > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: > app-20150329220506-0027/4 on worker-20150329112422-host1.domain.ex-60546 > (host1.domain.ex:60546) with 64 cores > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20150329220506-0027/4 on hostPort host1.domain.ex:60546 with 64 cores, > 512.0 MB RAM > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: > app-20150329220506-0027/5 on worker-20150329112421-host.domain.ex-59485 > (host.domain.ex:59485) with 64 cores > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20150329220506-0027/5 on hostPort host.domain.ex:59485 with 64 cores, > 512.0 MB RAM > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added: > app-20150329220506-0027/6 on worker-20150329112421-host5.domain.ex-40830 > (host5.domain.ex:40830) with 63 cores > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20150329220506-0027/6 on hostPort host5.domain.ex:40830 with 63 cores, > 512.0 MB RAM > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/2 is now LOADING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/0 is now LOADING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/1 is now LOADING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/4 is now LOADING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/3 is now LOADING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/5 is now LOADING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/0 is now RUNNING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/1 is now RUNNING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/2 is now RUNNING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/6 is now LOADING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/3 is now RUNNING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/4 is now RUNNING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/5 is now RUNNING > 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated: > app-20150329220506-0027/6 is now RUNNING > 15/03/29 22:05:06 INFO NettyBlockTransferService: Server created on 39447 > 15/03/29 22:05:06 INFO BlockManagerMaster: Trying to register BlockManager > 15/03/29 22:05:06 INFO BlockManagerMasterActor: Registering block manager > host5.domain.ex:39447 with 265.1 MB RAM, BlockManagerId(<driver>, > host5.domain.ex, 39447) > 15/03/29 22:05:06 INFO BlockManagerMaster: Registered BlockManager > 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is > ready for scheduling beginning after reached minRegisteredResourcesRatio: > 0.0 > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:59) > at > org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > Caused by: java.io.FileNotFoundException: Added file > > file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage > does not exist. > at org.apache.spark.SparkContext.addFile(SparkContext.scala:1089) > at org.apache.spark.SparkContext.addFile(SparkContext.scala:1065) > at > > com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:21) > at > > com.oreilly.learningsparkexamples.mini.scala.WordCount.main(WordCount.scala) > ... 6 more > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >