Just to add to this, here's some more info: val myDF = hiveContext.read.parquet("s3n://myBucket/myPath/")
Produces these... 2015-07-01 03:25:50,450 INFO [pool-14-thread-4] (org.apache.hadoop.fs.s3native.NativeS3FileSystem) - Opening 's3n://myBucket/myPath/part-r-00339.parquet' for reading That is to say, it actually opens and reads every frick'n file. Previously, it would have queued the command until an action was called. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-read-df-causes-excessive-IO-tp23541p23559.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org