Thanks for the help folks.

Adding the config files was necessary but not sufficient.

I also had hadoop 1.0.4 classes on the classpath because a bad jar:

   spark-0.9.1/jars/spark-assembly-0.9.1-hadoop1.0.4.jar

was in my spark executor tar.gz (stored in HDFS).

I believe this was due to a bit of unfortunate devops hygiene during the 
install of our new cluster.

After ensuring the pom referenced hadoop 2.3.0 and rebuilding with:

   mvn -Pyarn -Dhadoop.version=2.3.0 -Dyarn.version=2.3.0 -DskipTests clean 
package

I repackaged, chucked it into hdfs and relaunched my app.

Problem solved.

Hopefully, this will save someone else some tedium.

Thanks,

Steve


________________________________
From: Akhil Das [[email protected]]
Sent: Friday, July 04, 2014 1:55 AM
To: [email protected]
Subject: Re: No FileSystem for scheme: hdfs

​Most likely you are missing the hadoop configuration files (present in 
conf/*.xml).​

Thanks
Best Regards


On Fri, Jul 4, 2014 at 7:38 AM, Steven Cox 
<[email protected]<mailto:[email protected]>> wrote:
They weren't. They are now and the logs look a bit better - like perhaps some 
serialization is completing that wasn't before.

But I still get the same error periodically. Other thoughts?

________________________________
From: Soren Macbeth [[email protected]<mailto:[email protected]>]
Sent: Thursday, July 03, 2014 9:54 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: No FileSystem for scheme: hdfs

Are the hadoop configuration files on the classpath for your mesos executors?


On Thu, Jul 3, 2014 at 6:45 PM, Steven Cox 
<[email protected]<mailto:[email protected]>> wrote:
...and a real subject line.
________________________________
From: Steven Cox [[email protected]<mailto:[email protected]>]
Sent: Thursday, July 03, 2014 9:21 PM
To: [email protected]<mailto:[email protected]>
Subject:


Folks, I have a program derived from the Kafka streaming wordcount example 
which works fine standalone.


Running on Mesos is not working so well. For starters, I get the error below 
"No FileSystem for scheme: hdfs".


I've looked at lots of promising comments on this issue so now I have -

* Every jar under hadoop in my classpath

* Hadoop HDFS and Client in my pom.xml


I find it odd that the app writes checkpoint files to HDFS successfully for a 
couple of cycles then throws this exception. This would suggest the problem is 
not with the syntax of the hdfs URL, for example.


Any thoughts on what I'm missing?


Thanks,


Steve


Mesos : 0.18.2

Spark : 0.9.1



14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)

14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times; aborting 
job

14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job 
1404436460000 ms.0

org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10 times 
(most recent failure: Exception failure: java.io.IOException: No FileSystem for 
scheme: hdfs)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at 
org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at scala.Option.foreach(Option.scala:236)

        at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)

        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

        at akka.actor.ActorCell.invoke(ActorCell.scala:456)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)




Reply via email to