sqlCtx.read.parquet yields lots of small tasks

Johnny W. Sat, 07 May 2016 14:50:53 -0700

hi spark-user,

I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a
dataframe from a parquet data source with a single parquet file, it yields
a stage with lots of small tasks. It seems the number of tasks depends on
how many executors I have instead of how many parquet files/partitions I
have. Actually, it launches 5 tasks on each executor.


This behavior is quite strange, and may have potential issue if there is a
slow executor. What is this "parquet" stage for? and why it launches 5
tasks on each executor?

Thanks,
J.

sqlCtx.read.parquet yields lots of small tasks

Reply via email to