I recently had the same problem. I'm not an expert but will suggest that you concatenate your files into a smaller number of larger files. E.g. in Linux cat <files> >> a_larger_file. This helped greatly.
Likely others better qualified will weigh in on this later but that's something to get you started. D -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/too-many-small-files-and-task-tp20776p20783.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org