The Avro files were 500-600kb in size and that folder contained around 1200 files. The total folder size was around 600mb. Will try repartition. Thank you.
> > On Oct 28, 2016 at 2:24 AM, <Michael Armbrust > (mailto:mich...@databricks.com)> wrote: > > > > How big are your avro files? We collapse many small files into a single > partition to eliminate scheduler overhead. If you need explicit > parallelism you can also repartition. > > > > On Thu, Oct 27, 2016 at 5:19 AM, Prithish <prith...@gmail.com > (mailto:prith...@gmail.com)> wrote: > > > > > > > I am trying to read a bunch of AVRO files from a S3 folder using Spark 2.0. > > No matter how many executors I use or what configuration changes I make, > > the cluster doesn't seem to use all the executors. I am using the > > com.databricks.spark.avro library from databricks to read the AVRO. > > > > > > > > However, if I try the same on CSV files (same S3 folder, same configuration > > and cluster), it does use all executors. > > > > > > > > Is there something that I need to do to enable parallelism when using the > > AVRO databricks library? > > > > > > > > Thanks for your help. > > > > > > > > > > >