Re: sqlCtx.read.parquet yields lots of small tasks

Ashish Dubey Sat, 07 May 2016 16:21:07 -0700

How big is your file and can you also share the code snippet

On Saturday, May 7, 2016, Johnny W. <jzw.ser...@gmail.com> wrote:


> hi spark-user,
>
> I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a
> dataframe from a parquet data source with a single parquet file, it yields
> a stage with lots of small tasks. It seems the number of tasks depends on
> how many executors I have instead of how many parquet files/partitions I
> have. Actually, it launches 5 tasks on each executor.
>
> This behavior is quite strange, and may have potential issue if there is a
> slow executor. What is this "parquet" stage for? and why it launches 5
> tasks on each executor?
>
> Thanks,
> J.
>

Re: sqlCtx.read.parquet yields lots of small tasks

Reply via email to