How big is your file and can you also share the code snippet On Saturday, May 7, 2016, Johnny W. <jzw.ser...@gmail.com> wrote:
> hi spark-user, > > I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a > dataframe from a parquet data source with a single parquet file, it yields > a stage with lots of small tasks. It seems the number of tasks depends on > how many executors I have instead of how many parquet files/partitions I > have. Actually, it launches 5 tasks on each executor. > > This behavior is quite strange, and may have potential issue if there is a > slow executor. What is this "parquet" stage for? and why it launches 5 > tasks on each executor? > > Thanks, > J. >