Hello, I am trying to read in a 20GB file from an S3 bucket. I have verified I can read small files from my cluster. The cluster itself has 15 slaves and a master, each slave has 16GB of RAM, the machines are Amazon m1.xlarge instances.
All I am doing is below, however a minute into execution I get the ERROR and the subsequent WARNings. Anyone have any ideas what is going on? Why is this so difficult? ;) Thanks! Ognen scala> val f = sc.textFile("s3n://ognen-data-pipeline/large_data/2013-11-30.json") f: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> f.count 14/01/19 12:03:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/19 12:03:31 WARN LoadSnappy: Snappy native library not loaded 14/01/19 12:04:23 ERROR Client$ClientActor: Master removed our application: FAILED; stopping client 14/01/19 12:04:23 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection... 14/01/19 12:04:24 ERROR ClusterScheduler: Lost executor 10 on 10.10.0.200: remote Akka client shutdown 14/01/19 12:04:24 WARN ClusterTaskSetManager: Lost TID 3 (task 0.0:3) 14/01/19 12:04:24 WARN ClusterTaskSetManager: Lost TID 1 (task 0.0:1) 14/01/19 12:04:24 WARN ClusterTaskSetManager: Lost TID 2 (task 0.0:2) 14/01/19 12:04:24 WARN ClusterTaskSetManager: Lost TID 0 (task 0.0:0)