Re: Spark 1.4.0: read.df() causes excessive IO

Exie Tue, 30 Jun 2015 20:27:46 -0700

Just to add to this, here's some more info:

val myDF = hiveContext.read.parquet("s3n://myBucket/myPath/")


Produces these...
2015-07-01 03:25:50,450  INFO [pool-14-thread-4]
(org.apache.hadoop.fs.s3native.NativeS3FileSystem) - Opening
's3n://myBucket/myPath/part-r-00339.parquet' for reading

That is to say, it actually opens and reads every frick'n file.
Previously, it would have queued the command until an action was called.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-read-df-causes-excessive-IO-tp23541p23559.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark 1.4.0: read.df() causes excessive IO

Reply via email to